Hive

Hadoop + Hive + mysql

hive是基于hadoop的一个数据仓库.

这里省略Hadoop环境的搭建。 

将元数据存储在 RDBMS 中,一般常用 MySQL 和 Derby。默认情况下,Hive 元数据保存在内嵌的 Derby 数据库中,只能允许一个会话连接,只适合简单的测试。实际生产环境中不适用, 为了支持多用户会话,则需要一个独立的元数据库,使用 MySQL 作为元数据库,Hive 内部对 MySQL 提供了很好的支持。

hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,
并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
Hive是建立在Hadoop上的数据仓库基础构架。它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储、查询和分析存储在 Hadoop 中的大规模数据的机制。Hive 定义了简单的类 SQL 查询语言,称为 HQL,它允许熟悉SQL的用户查询数据。同时,这个语言也允许熟悉MapReduce开发者的开发自定义的mapper和 reducer来处理内建的mapper和reducer无法完成的复杂的分析工作。

下载hive包
http://apache.01link.hk/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz  ----------这里不要使用3.1.1目前在3.1.1这个版本配置的时候碰到bug 解决不了beeline连不上的问题

hadoop@muhe221:~/soft$ tar -zxvf apache-hive-2.3.4-bin.tar.gz
hadoop@muhe221:~/soft$ mv apache-hive2.3.4-bin hive-2.3.4

$ vim ~/.bashrc
export HIVE_HOME=~/soft/hive-2.3.4
export PATH=$HIVE_HOME/bin:$PATH

 hadoop@muhe221:~/soft/hive-2.3.4$ cp mysql-connector-java-5.1.47-bin.jar ~/soft/hive-2.3.4/lib #拷贝mysql驱动 

这里使用高版本驱动,否则可能会有问题

hadoop@muhe221:~/soft$ hadoop fs -mkdir -p /user/hive/warehouse
hadoop@muhe221:~/soft$ hadoop fs -mkdir -p /tmp/hive
hadoop@muhe221:~/soft$ hadoop fs -chmod 777 /user/hive/warehouse
hadoop@muhe221:~/soft$ hadoop fs -chmod 777 /tmp/hive

 hadoop@muhe221:~/soft/hive-2.3.4/conf$ cp hive-default.xml.template hive-site.xml
 hadoop@muhe221:~/soft/hive-2.3.4/conf$ cp hive-env.sh.template hive-env.sh

用户自定义配置文件:$HIVE_CONF_DIR/hive-site.xml
默认配置文件: $HIVE_CONF_DIR/hive-default.xml
用户自定义配置会覆盖默认配置,hive会读入hadoop配置,因为hive是作为hadoop的客户端启动的

修改hive-site.xml中的数据库为mysql

<!-- mysql 驱动 -->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<!-- 链接地址 -->
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://10.121.63.240:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false</value>
</property>
<!-- 用户名 -->
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hadoop</value>
<description>Username to use against metastore database</description>
</property>
<!-- 密码 -->
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123456</value>
<description>password to use against metastore database</description>
</property> 

如果碰到ssl连接警告问题 那么使用<value>jdbc:mysql://10.121.63.240:3306/hive?createDatabaseIfNotExist=true&amp;useSSL=false<value>设置不使用ssl连接(&amp;就是&)

安装MySQL数据库
$sudo apt-get install mysql-server
设置root账户的密码123456 (这里我们不使用root账户进行数据库连接,另外添加hadoop这个账户)
添加hadoop账户,设置其密码为123456

mysql> GRANT ALL ON *.* TO 'hadoop'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges;  #这一步不可少
Query OK, 0 rows affected (0.00 sec)
mysql> select user,host,authentication_string  from mysql.user;
+------------------+-----------+-------------------------------------------+
| user             | host      | authentication_string                     |
+------------------+-----------+-------------------------------------------+
| root             | localhost | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| mysql.session    | localhost | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| mysql.sys        | localhost | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| debian-sys-maint | localhost | *964281208ED229548F354FE59D976F66302CD419 |
| hadoop           | %         | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+------------------+-----------+-------------------------------------------+
5 rows in set (0.00 sec)

mysql> show grants for 'hadoop'@'%';
+-------------------------------------------------------------+
| Grants for hadoop@%                                           |
+-------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'%' WITH GRANT OPTION |
+-------------------------------------------------------------+
1 row in set (0.00 sec)

为了使远程用户可以访问mysql需要做以下修改

sudo vi edit /etc/mysql/my.cnf 
#bind-address            = 127.0.0.1   
bind-address            = 0.0.0.0 

初始化hive: 

hadoop@muhe221:~/soft/hive-2.3.4/bin$ schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/soft/hive-2.3.4/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL:        jdbc:mysql://10.121.63.240:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver :    com.mysql.jdbc.Driver
Metastore connection User:       hadoop
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed

 

自动生成了hive数据库

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

修改hive-site.xml中相关临时目录 

  <property>
    <name>hive.exec.local.scratchdir</name>
    <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value>
    <description>Local scratch space for Hive jobs</description>
  </property>
  <property>
    <name>hive.downloaded.resources.dir</name>
    <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value>
    <description>Temporary local directory for added resources in the remote file system.</description>
  </property>
  <property>
    <name>hive.querylog.location</name>
    <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value>
    <description>Location of Hive run time structured log file</description>
  </property>  <property>
    <name>hive.server2.logging.operation.log.location</name>
    <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value>
  </property>

 hive连接

hadoop@muhe221:~/soft/hive-2.3.4$ hive
SLF4J: Class path contains multiple SLF4J bindings.
......
Logging initialized using configuration in jar:file:/home/hadoop/soft/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive Session ID = eaee3aa6-4d94-4c14-8cfb-11e547e55a64
hive> show databases;
OK
default
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.017 seconds
hive> create table page_view(viewTime INT, userid BIGINT, page_url STRING);
OK
Time taken: 2.23 seconds
hive> show tables;
OK
page_view
Time taken: 0.031 seconds, Fetched: 1 row(s)
hive> INSERT INTO TABLE PAGE_VIEW VALUES (34, 45, 'www.baidu.com');
......
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.85 sec   HDFS Read: 16889 HDFS Write: 292 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 850 msec
OK
Time taken: 94.974 seconds
hive> select * from page_view;
OK
34      45      www.baidu.com
Time taken: 0.373 seconds, Fetched: 1 row(s)

Hiveserver2 & beeline
现在hive连接已经过时,被 Hiveserver2 & beeline取代(可以本地或远程连接)。
需要配置hadoop-3.2.0/etc/hadoop/core-site.xml   这里配置里面hadoop为用户名

    <property>
        <name>hadoop.proxyuser.hadoop.hosts</name>
        <value>*</value>
    </property>
    <property>
        <name>hadoop.proxyuser.hadoop.groups</name>
        <value>*</value>
    </property>

重启hadoop
启动hiveserver2服务

hadoop@muhe221:~/soft/hive-2.3.4/bin$ ./hiveserver2
2019-03-05 11:05:53: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/soft/hive-2.3.4/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]

使用beeline连接

hadoop@muhe221:~$ beeline -u jdbc:hive2://10.121.63.240:10000 -n hadoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/soft/hive-2.3.4/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://10.121.63.240:10000
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.4 by Apache Hive

 

错误信息
hadoop@muhe221:~/soft/hive-2.3.4/bin$ schematool -initSchema -dbType mysql
Starting metastore schema initialization to 2.3.4
Initialization script hive-schema-2.3.4.mysql.sql 
Error: Table 'CTLGS' already exists (state=42S01,code=1050)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
之前初始化已经生成了hive数据库,重新初始化需要删除已有的hive数据库
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive               |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)
mysql> drop database hive;
Query OK, 74 rows affected (12.72 sec)

 

posted @ 2019-03-01 19:03  牧 天  阅读(478)  评论(0)    收藏  举报