Hive
Hadoop + Hive + mysql
hive是基于hadoop的一个数据仓库.
这里省略Hadoop环境的搭建。
将元数据存储在 RDBMS 中,一般常用 MySQL 和 Derby。默认情况下,Hive 元数据保存在内嵌的 Derby 数据库中,只能允许一个会话连接,只适合简单的测试。实际生产环境中不适用, 为了支持多用户会话,则需要一个独立的元数据库,使用 MySQL 作为元数据库,Hive 内部对 MySQL 提供了很好的支持。
hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射为一张数据库表,
并提供简单的sql查询功能,可以将sql语句转换为MapReduce任务进行运行。 其优点是学习成本低,可以通过类SQL语句快速实现简单的MapReduce统计,不必开发专门的MapReduce应用,十分适合数据仓库的统计分析。
Hive是建立在Hadoop上的数据仓库基础构架。它提供了一系列的工具,可以用来进行数据提取转化加载(ETL),这是一种可以存储、查询和分析存储在 Hadoop 中的大规模数据的机制。Hive 定义了简单的类 SQL 查询语言,称为 HQL,它允许熟悉SQL的用户查询数据。同时,这个语言也允许熟悉MapReduce开发者的开发自定义的mapper和 reducer来处理内建的mapper和reducer无法完成的复杂的分析工作。
下载hive包
http://apache.01link.hk/hive/hive-2.3.4/apache-hive-2.3.4-bin.tar.gz ----------这里不要使用3.1.1目前在3.1.1这个版本配置的时候碰到bug 解决不了beeline连不上的问题
hadoop@muhe221:~/soft$ tar -zxvf apache-hive-2.3.4-bin.tar.gz
hadoop@muhe221:~/soft$ mv apache-hive2.3.4-bin hive-2.3.4
$ vim ~/.bashrc export HIVE_HOME=~/soft/hive-2.3.4 export PATH=$HIVE_HOME/bin:$PATH
hadoop@muhe221:~/soft/hive-2.3.4$ cp mysql-connector-java-5.1.47-bin.jar ~/soft/hive-2.3.4/lib #拷贝mysql驱动
这里使用高版本驱动,否则可能会有问题
hadoop@muhe221:~/soft$ hadoop fs -mkdir -p /user/hive/warehouse hadoop@muhe221:~/soft$ hadoop fs -mkdir -p /tmp/hive hadoop@muhe221:~/soft$ hadoop fs -chmod 777 /user/hive/warehouse hadoop@muhe221:~/soft$ hadoop fs -chmod 777 /tmp/hive
hadoop@muhe221:~/soft/hive-2.3.4/conf$ cp hive-default.xml.template hive-site.xml
hadoop@muhe221:~/soft/hive-2.3.4/conf$ cp hive-env.sh.template hive-env.sh
用户自定义配置文件:$HIVE_CONF_DIR/hive-site.xml
默认配置文件: $HIVE_CONF_DIR/hive-default.xml
用户自定义配置会覆盖默认配置,hive会读入hadoop配置,因为hive是作为hadoop的客户端启动的
修改hive-site.xml中的数据库为mysql
<!-- mysql 驱动 --> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <!-- 链接地址 --> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://10.121.63.240:3306/hive?createDatabaseIfNotExist=true&useSSL=false</value> </property> <!-- 用户名 --> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hadoop</value> <description>Username to use against metastore database</description> </property> <!-- 密码 --> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>123456</value> <description>password to use against metastore database</description> </property>
如果碰到ssl连接警告问题 那么使用<value>jdbc:mysql://10.121.63.240:3306/hive?createDatabaseIfNotExist=true&useSSL=false<value>设置不使用ssl连接(&就是&)
安装MySQL数据库
$sudo apt-get install mysql-server
设置root账户的密码123456 (这里我们不使用root账户进行数据库连接,另外添加hadoop这个账户)
添加hadoop账户,设置其密码为123456
mysql> GRANT ALL ON *.* TO 'hadoop'@'%' IDENTIFIED BY '123456' WITH GRANT OPTION;
Query OK, 0 rows affected, 1 warning (0.00 sec)
mysql> flush privileges; #这一步不可少
Query OK, 0 rows affected (0.00 sec)
mysql> select user,host,authentication_string from mysql.user;
+------------------+-----------+-------------------------------------------+
| user | host | authentication_string |
+------------------+-----------+-------------------------------------------+
| root | localhost | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
| mysql.session | localhost | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| mysql.sys | localhost | *THISISNOTAVALIDPASSWORDTHATCANBEUSEDHERE |
| debian-sys-maint | localhost | *964281208ED229548F354FE59D976F66302CD419 |
| hadoop | % | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |
+------------------+-----------+-------------------------------------------+
5 rows in set (0.00 sec)
mysql> show grants for 'hadoop'@'%';
+-------------------------------------------------------------+
| Grants for hadoop@% |
+-------------------------------------------------------------+
| GRANT ALL PRIVILEGES ON *.* TO 'hadoop'@'%' WITH GRANT OPTION |
+-------------------------------------------------------------+
1 row in set (0.00 sec)
为了使远程用户可以访问mysql需要做以下修改
sudo vi edit /etc/mysql/my.cnf
#bind-address = 127.0.0.1
bind-address = 0.0.0.0
初始化hive:
hadoop@muhe221:~/soft/hive-2.3.4/bin$ schematool -initSchema -dbType mysql
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/soft/hive-2.3.4/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Metastore connection URL: jdbc:mysql://10.121.63.240:3306/hive?createDatabaseIfNotExist=true
Metastore Connection Driver : com.mysql.jdbc.Driver
Metastore connection User: hadoop
Starting metastore schema initialization to 2.3.0
Initialization script hive-schema-2.3.0.mysql.sql
Initialization script completed
schemaTool completed
自动生成了hive数据库
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
修改hive-site.xml中相关临时目录
<property> <name>hive.exec.local.scratchdir</name> <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value> <description>Local scratch space for Hive jobs</description> </property> <property> <name>hive.downloaded.resources.dir</name> <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value> <description>Temporary local directory for added resources in the remote file system.</description> </property> <property> <name>hive.querylog.location</name> <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value> <description>Location of Hive run time structured log file</description> </property> <property> <name>hive.server2.logging.operation.log.location</name> <value>/home/hadoop/soft/hive-2.3.4/javaiotmpdir</value> </property>
hive连接
hadoop@muhe221:~/soft/hive-2.3.4$ hive
SLF4J: Class path contains multiple SLF4J bindings.
......
Logging initialized using configuration in jar:file:/home/hadoop/soft/hive-2.3.4/lib/hive-common-2.3.4.jar!/hive-log4j2.properties Async: true
Hive Session ID = eaee3aa6-4d94-4c14-8cfb-11e547e55a64
hive> show databases;
OK
default
Time taken: 0.048 seconds, Fetched: 1 row(s)
hive> use default;
OK
Time taken: 0.017 seconds
hive> create table page_view(viewTime INT, userid BIGINT, page_url STRING);
OK
Time taken: 2.23 seconds
hive> show tables;
OK
page_view
Time taken: 0.031 seconds, Fetched: 1 row(s)
hive> INSERT INTO TABLE PAGE_VIEW VALUES (34, 45, 'www.baidu.com');
......
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.85 sec HDFS Read: 16889 HDFS Write: 292 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 850 msec
OK
Time taken: 94.974 seconds
hive> select * from page_view;
OK
34 45 www.baidu.com
Time taken: 0.373 seconds, Fetched: 1 row(s)
Hiveserver2 & beeline
现在hive连接已经过时,被 Hiveserver2 & beeline取代(可以本地或远程连接)。
需要配置hadoop-3.2.0/etc/hadoop/core-site.xml 这里配置里面hadoop为用户名
<property> <name>hadoop.proxyuser.hadoop.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hadoop.groups</name> <value>*</value> </property>
重启hadoop
启动hiveserver2服务
hadoop@muhe221:~/soft/hive-2.3.4/bin$ ./hiveserver2
2019-03-05 11:05:53: Starting HiveServer2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/soft/hive-2.3.4/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
使用beeline连接
hadoop@muhe221:~$ beeline -u jdbc:hive2://10.121.63.240:10000 -n hadoop
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/soft/hive-2.3.4/lib/log4j-slf4j-impl-2.6.2.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
Connecting to jdbc:hive2://10.121.63.240:10000
Connected to: Apache Hive (version 2.3.4)
Driver: Hive JDBC (version 2.3.4)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.4 by Apache Hive
错误信息
hadoop@muhe221:~/soft/hive-2.3.4/bin$ schematool -initSchema -dbType mysql
Starting metastore schema initialization to 2.3.4
Initialization script hive-schema-2.3.4.mysql.sql
Error: Table 'CTLGS' already exists (state=42S01,code=1050)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
Underlying cause: java.io.IOException : Schema script failed, errorcode 2
之前初始化已经生成了hive数据库,重新初始化需要删除已有的hive数据库
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| hive |
| mysql |
| performance_schema |
| sys |
+--------------------+
5 rows in set (0.00 sec)
mysql> drop database hive;
Query OK, 74 rows affected (12.72 sec)

浙公网安备 33010602011771号