Hadoop（8）--hive与数据挖掘

1、Hive入门

2、解压Hive，到/usr/local目录，将解压后的目录名mv为hive

　设定环境变量HADOOP_HOME，HIVE_HOME，将bin目录加入到PATH中

3、

cd /usr/local/hive/conf
cp hive-default.xml.template hive-site.xml
修改hive.metastore.schema.verification，设定为false
创建/usr/local/hive/tmp目录，替换${system:java.io.tmpdir}为该目录
替换${system:user.name}为root

4、 schematool -initSchema -dbType derby
　　会在当前目录下简历metastore_db的数据库。
　　注意！！！下次执行hive时应该还在同一目录，默认到当前目录下寻找metastore。
　　遇到问题，把metastore_db删掉，重新执行命令
　　实际工作环境中，经常使用mysql作为metastore的数据

5、启动hive

6、观察hadoop fs -ls /tmp/hive中目录的创建

7、　

　　1.show databases;

　　2.use default;

　　3.create table doc(line string);

　　4.show tables;

　　5.desc doc;

　　6.select * from doc;

　　7.drop table doc;

8、观察hadoop fs -ls /user

9、启动yarn

10、

    1.load data inpath '/wcinput' overwrite into table doc;
    2.select * from doc;
    3.select split(line, ' ') from doc;
    4.select explode(split(line, ' ')) from doc;
    5.select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word;
    6.select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word order by word;
    7.create table word_counts as select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word order by word;
    8.select * from word_counts;
    9.dfs -ls /user/hive/...

11、使用sougou搜索日志做实验

12、将日志文件上传的hdfs系统，启动hive

13、create table sougou (qtime string, qid string, qword string, url string) row format delimited fields terminated by ',';

14、load data inpath '/sougou.dic' into table sougou;

15、select count(*) from sougou;

16、create table sougou_results as select keyword, count(1) as count from (select qword as keyword from sougou) t group by keyword order by count desc;

17、select * from sougou_results limit 10;

posted @ 2018-01-22 18:28 xu_shuyi 阅读(313) 评论(0) 收藏举报

刷新页面返回顶部

xu_shuyi

Hadoop（8）--hive与数据挖掘

公告