Hive与HBase集成进行数据分析

(一)Hive 概述

http://ke.dajiangtai.com/content/6912/1.png

(二)HiveHadoop生态圈中的位置

http://ke.dajiangtai.com/content/6912/2.png

(三)Hive 架构设计

http://ke.dajiangtai.com/content/6912/3.png

http://ke.dajiangtai.com/content/6912/4.png

(四)Hive 的优点及应用场景

http://ke.dajiangtai.com/content/6912/5.png

(五)Hive 的下载和安装部署

1.Hive 下载

Apache版本的Hive。

Cloudera版本的Hive。

这里选择下载Apache稳定版本apache-hive-0.13.1-bin.tar.gz,并上传至bigdata-pro03.kfk.com节点。

2.解压安装hive

[kfk@bigdata-pro03 modules]$ ls

flume-1.7.0-bin  hbase-0.98.6-cdh5.3.0        jdk1.8.0_60         kafka_2.11-0.9.0.0   zookeeper-3.4.5-cdh5.10.0

hadoop-2.6.0     hive-0.13.1-cdh5.3.0.tar.gz  kafka_2.11-0.8.2.1  producer.properties

[kfk@bigdata-pro03 modules]$ tar -zxf hive-0.13.1-cdh5.3.0.tar.gz

[kfk@bigdata-pro03 modules]$ ls

flume-1.7.0-bin  hbase-0.98.6-cdh5.3.0  hive-0.13.1-cdh5.3.0.tar.gz  kafka_2.11-0.8.2.1  producer.properties

hadoop-2.6.0     hive-0.13.1-cdh5.3.0   jdk1.8.0_60                  kafka_2.11-0.9.0.0  zookeeper-3.4.5-cdh5.10.0

[kfk@bigdata-pro03 modules]$ mv hive-0.13.1-cdh5.3.0.tar.gz /opt/softwares/

[kfk@bigdata-pro03 modules]$ ls

flume-1.7.0-bin  hbase-0.98.6-cdh5.3.0  jdk1.8.0_60         kafka_2.11-0.9.0.0   zookeeper-3.4.5-cdh5.10.0

hadoop-2.6.0     hive-0.13.1-cdh5.3.0   kafka_2.11-0.8.2.1  producer.properties

[kfk@bigdata-pro03 modules]$ cd hive-0.13.1-cdh5.3.0/

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ ll

total 312

drwxr-xr-x 3 kfk kfk   4096 Dec 17  2014 bin

drwxr-xr-x 2 kfk kfk   4096 Dec 17  2014 conf

drwxr-xr-x 3 kfk kfk   4096 Dec 17  2014 data

drwxr-xr-x 6 kfk kfk   4096 Dec 17  2014 docs

drwxr-xr-x 4 kfk kfk   4096 Dec 17  2014 examples

drwxr-xr-x 7 kfk kfk   4096 Dec 17  2014 hcatalog

drwxr-xr-x 4 kfk kfk   4096 Dec 17  2014 lib

-rw-r--r-- 1 kfk kfk  23828 Dec 17  2014 LICENSE

-rw-r--r-- 1 kfk kfk    277 Dec 17  2014 NOTICE

-rw-r--r-- 1 kfk kfk   3838 Dec 17  2014 README.txt

-rw-r--r-- 1 kfk kfk 253839 Dec 17  2014 RELEASE_NOTES.txt

drwxr-xr-x 3 kfk kfk   4096 Dec 17  2014 scripts

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ cd scripts/

[kfk@bigdata-pro03 scripts]$ ls

metastore

[kfk@bigdata-pro03 scripts]$ cd metastore/

[kfk@bigdata-pro03 metastore]$ ls

upgrade

[kfk@bigdata-pro03 metastore]$ cd upgrade/

[kfk@bigdata-pro03 upgrade]$ ls

derby  mssql  mysql  oracle  postgres         //hive默认的五种metastore数据库

 

3.修改hive-log4j.properties配置文件

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ cd conf/

[kfk@bigdata-pro03 conf]$ ll

total 120

-rw-r--r-- 1 kfk kfk 109185 Dec 17  2014 hive-default.xml.template

-rw-r--r-- 1 kfk kfk   2378 Dec 17  2014 hive-env.sh.template

-rw-r--r-- 1 kfk kfk   2662 Dec 17  2014 hive-exec-log4j.properties.template

-rw-r--r-- 1 kfk kfk   3505 Dec 17  2014 hive-log4j.properties.template

[kfk@bigdata-pro03 conf]$ mv hive-log4j.properties.template hive-log4j.properties

[kfk@bigdata-pro03 conf]$ vi hive-log4j.properties

#日志目录需要提前创建

hive.log.dir=/opt/modules/hive-0.13.1-cdh5.3.0/logs

 

4.修改hive-env.sh配置文件

[kfk@bigdata-pro03 conf]$ mv hive-env.sh.template hive-env.sh

[kfk@bigdata-pro03 conf]$ vi hive-env.sh

export HADOOP_HOME=/opt/modules/hadoop-2.6.0

export HIVE_CONF_DIR=/opt/modules/hive-0.13.1-cdh5.3.0/conf

  

5.首先启动HDFS(启动过程就不再累述了有疑问的请参考前面博文),然后创建Hive的目录(官网要求如下)。



/tmp目录已有,不需要创建。

bin/hdfs dfs -mkdir -p /user/hive/warehouse
bin/hdfs dfs -chmod g+w /user/hive/warehouse

 


6.启动hive

[kfk@bigdata-pro03 zookeeper-3.4.5-cdh5.10.0]$ cd ../hive-0.13.1-cdh5.3.0/

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ bin/hive

Logging initialized using configuration in file:/opt/modules/hive-0.13.1-cdh5.3.0/conf/hive-log4j.properties

hive> show databases;

OK

default

Time taken: 0.872 seconds, Fetched: 1 row(s)

 

启动成功!

(六)Hive MySQL集成

1.在/opt/modules/hive-0.13.1-bin/conf目录下创建hive-site.xml文件,配置mysql元数据库。

vi hive-site.xml

  <property>

    <name>javax.jdo.option.ConnectionURL</name>

    <value>jdbc:mysql://bigdata-pro01.kfk.com/metastore?createDatabaseIfNotExist=true</value>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionDriverName</name>

    <value>com.mysql.jdbc.Driver</value>

  </property>

 <property>

    <name>javax.jdo.option.ConnectionUserName</name>

    <value>root</value>

  </property>

  <property>

    <name>javax.jdo.option.ConnectionPassword</name>

    <value>root</value>

  </property>

 

2.设置用户连接

1)查看用户信息

[kfk@bigdata-pro01 hadoop-2.6.0]$ mysql -uroot -proot

mysql> show databases;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| mysql              |

| test               |

+--------------------+

3 rows in set (0.00 sec)



mysql> use mysql;

Reading table information for completion of table and column names

You can turn off this feature to get a quicker startup with -A

Database changed

mysql> show tables;

+---------------------------+

| Tables_in_mysql           |

+---------------------------+

| columns_priv              |

| db                        |

| event                     |

| func                      |

| general_log               |

| help_category             |

| help_keyword              |

| help_relation             |

| help_topic                |

| host                      |

| ndb_binlog_index          |

| plugin                    |

| proc                      |

| procs_priv                |

| servers                   |

| slow_log                  |

| tables_priv               |

| time_zone                 |

| time_zone_leap_second     |

| time_zone_name            |

| time_zone_transition      |

| time_zone_transition_type |

| user                      |

+---------------------------+

23 rows in set (0.00 sec)

mysql> select User,Host,Password from user;

+------+-----------------------+-------------------------------------------+

| User | Host                  | Password                                  |

+------+-----------------------+-------------------------------------------+

| root | localhost             | *81F5E21E35407D884A6CD4A731AEBFB6AF209E1B |

| root | bigdata-pro01.kfk.com |                                           |

| root | 127.0.0.1             |                                           |

|      | localhost             |                                           |

|      | bigdata-pro01.kfk.com |                                           |

+------+-----------------------+-------------------------------------------+

5 rows in set (0.00 sec)

 

2)更新用户信息

update user set Host='%' where User = 'root' and Host='localhost';

 

3)删除用户信息

delete from user where user='root' and host='127.0.0.1';

delete from user where user='root' and host='bigdata-pro01.kfk.com';

delete from user where host='localhost';

delete from user where host='bigdata-pro01.kfk.com';

select User,Host,Password from user;

 

4)刷新信息

flush privileges;

 

3.上传mysql驱动包(mysql-connector-java-5.1.27.jar)到hive的lib目录下

4.保证第三台集群到其他节点无秘钥登录

做完以上工作之后就可以保证集群的其他节点也可以直接登录到Hive环境中去。

(七)Hive 服务启动与测试

1.启动HDFS和YARN服务

2.启动hive

./hive

 

3.通过hive服务创建表

CREATE TABLE test(id INT,name STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ;

 

4.创建数据文件

[kfk@bigdata-pro03 lib]$ cd /opt/datas/

[kfk@bigdata-pro03 datas]$ ls

weblog-flume.log  weblog.log  weblog-shell.sh  weblogs.log

[kfk@bigdata-pro03 datas]$ vi test.txt

0001      mary(中间用Tab隔开)

0002      jacky

0003      monica

0004      tom

 

5.加载数据到hive表中

hive> load data local inpath '/opt/datas/test.txt' into table test;

Copying data from file:/opt/datas/test.txt

Copying file: file:/opt/datas/test.txt

Loading data to table default.test

Table default.test stats: [numFiles=1, numRows=0, totalSize=43, rawDataSize=0]

OK

Time taken: 2.568 seconds

hive> select * from test;

OK

1     mary

2     jacky

3     monica

4     tom

Time taken: 0.4 seconds, Fetched: 4 row(s)

 

再看节点1的MySQL端:

mysql> show databases;

+--------------------+

| Database           |

+--------------------+

| information_schema |

| metastore          |

| mysql              |

| test               |

+--------------------+

4 rows in set (0.00 sec)

mysql> use metastore;

mysql> select * from TBLS;

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

| TBL_ID | CREATE_TIME | DB_ID | LAST_ACCESS_TIME | OWNER | RETENTION | SD_ID | TBL_NAME | TBL_TYPE      | VIEW_EXPANDED_TEXT | VIEW_ORIGINAL_TEXT |

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

|      2 |  1553240538 |     1 |                0 | kfk   |         0 |     2 | test     | MANAGED_TABLE | NULL               | NULL               |

+--------+-------------+-------+------------------+-------+-----------+-------+----------+---------------+--------------------+--------------------+

1 row in set (0.00 sec)

 

接下来再做以下配置:


将文件中红框的内容拷贝到hive-site.xml文件中去,并将value值改为true。

<property>

  <name>hive.cli.print.header</name>

  <value>true</value>

</property>

<property>

  <name>hive.cli.print.current.db</name>

  <value>true</value>

</property>

 

然后退出重启Hive,可以发现会显示出数据库名和表的列名信息。


可以发现,我们在Hive端创建的数据库和表已经同步到我们的MySQL端了,即MySQL与Hive的集成没有问题。

(八)HiveHBase集成

1.在hive-site.xml文件中配置Zookeeper,hive通过这个参数去连接HBase集群。

<property>

    <name>hbase.zookeeper.quorum</name>   <value>bigdata-pro01.kfk.com,bigdata-pro02.kfk.com,bigdata-pro03.kfk.com</value>

</property>

 

2.将hbase的9个包拷贝到hive/lib目录下。

在[kfk@bigdata-pro03 lib]$路径下运行下列指令:

export HBASE_HOME=/opt/modules/hbase-0.98.6-cdh5.3.0

export HIVE_HOME=/opt/modules/hive-0.13.1-cdh5.3.0

ln -s $HBASE_HOME/lib/hbase-server-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-server-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-client-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-client-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-protocol-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-it-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-it-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/htrace-core-2.04.jar$HIVE_HOME/lib/htrace-core-2.04.jar

ln -s $HBASE_HOME/lib/hbase-hadoop2-compact-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-hadoop2-compact-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/hbase-hadoop-compact-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-hadoop-compact-0.98.6-cdh5.3.0.jar

ln -s $HBASE_HOME/lib/high-scale-lib-1.1.1.jar $HIVE_HOME/lib/high-scale-lib-1.1.1.jar

ln -s $HBASE_HOME/lib/hbase-common-0.98.6-cdh5.3.0.jar $HIVE_HOME/lib/hbase-common-0.98.6-cdh5.3.0.jar

 

执行完以上指令之后使用ll命令查看一下拷贝的jar包是否有问题,没有问题的话再进行下一步。


像上图就是有问题的jar包,最好进到hbase的lib目录下用cp命令重新将有问题的包拷贝一遍。

3.创建与HBase集成的Hive的外部表

hive (default)> create external table weblogs(

               id string,

               datatime string,

               userid string,

               searchname string,

               retorder string,

               cliorder string,

               cliurl string

               )

               STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'

               WITH SERDEPROPERTIES("hbase.columns.mapping" = ":key,info:datatime,info:userid,info:searchname,info:retorder,info:cliorder,info:cliurl") TBLPROPERTIES("hbase.table.name" = "weblogs");

OK

Time taken: 3.661 seconds

 

结果有报错!



从日志可以看出错误原因是DNS服务贸易配置,无法进行正确的地址解析,解决原因参加博客:使用bind配置DNS服务(CentOS 6.5)

#查看hbase数据记录

报错解决之后删掉之前的业务表,重新创建,然后再查询:

select * from weblogs;

 


我们再运行一个MapReduce程序查看表里有多少条数据:



4.hive 中beeline和hiveserver2的使用

1)启动hiveserver2

bin/hiveserver2

 

2)启动beeline

#连接hive2服务

[kfk@bigdata-pro03 hive-0.13.1-cdh5.3.0]$ bin/beeline

Beeline version 0.13.1-cdh5.3.0 by Apache Hive

beeline> !connect jdbc:hive2://bigdata-pro03.kfk.com:10000

scan complete in 15ms

Connecting to jdbc:hive2://bigdata-pro03.kfk.com:10000

Enter username for jdbc:hive2://bigdata-pro03.kfk.com:10000: kfk

Enter password for jdbc:hive2://bigdata-pro03.kfk.com:10000: ***

Connected to: Apache Hive (version 0.13.1-cdh5.3.0)

Driver: Hive JDBC (version 0.13.1-cdh5.3.0)

Transaction isolation: TRANSACTION_REPEATABLE_READ

0: jdbc:hive2://bigdata-pro03.kfk.com:10000>

 

#查看表

0: jdbc:hive2://bigdata-pro03.kfk.com:10000> show tables;

+-----------+--+

| tab_name  |

+-----------+--+

| test      |

| weblogs   |

+-----------+--+

2 rows selected (1.44 seconds)

 

 

#查看前10条数据

0: jdbc:hive2://bigdata-pro03.kfk.com:10000> select * from weblogs limit 10;

+-------------------------------------------+-------------------+----------------------+---------------------+-------------------+-------------------+----------------------------------------------------------------------------------+--+

|                weblogs.id                 | weblogs.datatime  |    weblogs.userid    | weblogs.searchname  | weblogs.retorder  | weblogs.cliorder  |                                  weblogs.cliurl                                  |

+-------------------------------------------+-------------------+----------------------+---------------------+-------------------+-------------------+----------------------------------------------------------------------------------+--+

| 001436217275865958600:00:011541058199433  | 00:00:01          | 0014362172758659586  | [明星合成]              | 64                | 21                | link.44box.com/                                                                  |

| 001436217275865958600:00:011541058212288  | 00:00:01          | 0014362172758659586  | [明星合成]              | 64                | 21                | link.44box.com/                                                                  |

| 001442567877373568400:00:201541058289917  | 00:00:20          | 0014425678773735684  | [www.hntz.net]      | 2                 | 1                 | www.gaywang.com/                                                                 |

| 001442567877373568400:00:201541058302167  | 00:00:20          | 0014425678773735684  | [www.hntz.net]      | 2                 | 1                 | www.gaywang.com/                                                                 |

| 001982399550216229500:00:081541058229644  | 00:00:08          | 0019823995502162295  | [抗震救灾晚会]            | 8                 | 17                | news.xinhuanet.com/newscenter/2008-05/31/content_8287444.htm                     |

| 001982399550216229500:00:081541058242323  | 00:00:08          | 0019823995502162295  | [抗震救灾晚会]            | 8                 | 17                | news.xinhuanet.com/newscenter/2008-05/31/content_8287444.htm                     |

| 001982399550216229500:00:341541058380339  | 00:00:34          | 0019823995502162295  | [抗震救灾晚会]            | 10                | 18                | news.qq.com/a/20080531/000369.htm                                                |

| 001982399550216229500:00:341541058395143  | 00:00:34          | 0019823995502162295  | [抗震救灾晚会]            | 10                | 18                | news.qq.com/a/20080531/000369.htm                                                |

| 00697069831361674100:00:091541058229692   | 00:00:09          | 006970698313616741   | [高成义+厦门]            | 4                 | 2                 | www.chinatally.com.cn/backend/infoview.jsp?infoID=1148&infoClass=.002.&comID=17  |

| 00697069831361674100:00:091541058242369   | 00:00:09          | 006970698313616741   | [高成义+厦门]            | 4                 | 2                 | www.chinatally.com.cn/backend/infoview.jsp?infoID=1148&infoClass=.002.&comID=17  |

+-------------------------------------------+-------------------+----------------------+---------------------+-------------------+-------------------+----------------------------------------------------------------------------------+--+

10 rows selected (2.816 seconds)

 


以上就是博主为大家介绍的这一板块的主要内容,这都是博主自己的学习过程,希望能给大家带来一定的指导作用,有用的还望大家点个支持,如果对你没用也望包涵,有错误烦请指出。如有期待可关注博主以第一时间获取更新哦,谢谢!同时也欢迎转载,但必须在博文明显位置标注原文地址,解释权归博主所有!

posted @ 2019-04-13 16:16  子墨言良  阅读(560)  评论(0编辑  收藏  举报