Hadoop(8)--hive与数据挖掘
1、Hive入门
2、解压Hive,到/usr/local目录,将解压后的目录名mv为hive
设定环境变量HADOOP_HOME,HIVE_HOME,将bin目录加入到PATH中
3、
- cd /usr/local/hive/conf
- cp hive-default.xml.template hive-site.xml
- 修改hive.metastore.schema.verification,设定为false
- 创建/usr/local/hive/tmp目录,替换${system:java.io.tmpdir}为该目录
- 替换${system:user.name}为root
4、 schematool -initSchema -dbType derby
会在当前目录下简历metastore_db的数据库。
注意!!!下次执行hive时应该还在同一目录,默认到当前目录下寻找metastore。
遇到问题,把metastore_db删掉,重新执行命令
实际工作环境中,经常使用mysql作为metastore的数据
5、启动hive
6、观察hadoop fs -ls /tmp/hive中目录的创建
7、
1.show databases;
2.use default;
3.create table doc(line string);
4.show tables;
5.desc doc;
6.select * from doc;
7.drop table doc;
8、观察hadoop fs -ls /user
9、启动yarn
10、
1.load data inpath '/wcinput' overwrite into table doc; 2.select * from doc; 3.select split(line, ' ') from doc; 4.select explode(split(line, ' ')) from doc; 5.select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word; 6.select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word order by word; 7.create table word_counts as select word, count(1) as count from (select explode(split(line, ' ')) as word from doc) w group by word order by word; 8.select * from word_counts; 9.dfs -ls /user/hive/...
11、使用sougou搜索日志做实验
12、将日志文件上传的hdfs系统,启动hive
13、create table sougou (qtime string, qid string, qword string, url string) row format delimited fields terminated by ',';
14、load data inpath '/sougou.dic' into table sougou;
15、select count(*) from sougou;
16、create table sougou_results as select keyword, count(1) as count from (select qword as keyword from sougou) t group by keyword order by count desc;
17、select * from sougou_results limit 10;

浙公网安备 33010602011771号