Hive笔记

Hive学习笔记

安装部分（略）：

http://dblab.xmu.edu.cn/blog/2630-2/

一、Hive基础

1.1 内部是什么

HIVE二进制分支版本核心包括3个部分，

$HIVE_HOME/lib ：JAR库

$HIVE_HOME/bin ：执行的各种可执行文件

$HIVE_HOME/conf ：配置文件

其他包括：metastoreservice（元数据服务）、Thrift服务（远程访问）、HWI（网页界面）

1.2 文件配置

通常情况下，会有二个文件需要配置，一个是hive-site.xml、另外一个是hive-default.xml。

step1：hive-default.xml文件在$HIVE_HOME/conf目录下会有典型的模板，只需重命名即可：

1 cd $HIVE_HOME2 
2 mv hive-default.xml.template hive-default.xml

step2：使用VIM创建hive-site.xml文件，如没有：

1 vim hive-site.xml

编辑文件

 1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 
 4 <configuration>
 5   <property>
 6     <name>javax.jdo.option.ConnectionURL</name>
 7     <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
 8     <description>元数据配置库，这里用的是本地MYSQL库，可以选择其他配置方式，参见“Hive metastore三种配置方式” </description>
 9   </property>
10   <property>
11     <name>javax.jdo.option.ConnectionDriverName</name>
12     <value>com.mysql.jdbc.Driver</value>
13     <description>元数据驱动</description>
14   </property>
15   <property>
16     <name>javax.jdo.option.ConnectionUserName</name>
17     <value>hive</value>
18    <description>元数据库的用户名</description>
19   </property>
20   <property>
21     <name>javax.jdo.option.ConnectionPassword</name>
22     <value>1qaz2wsx3edc</value>
23     <description>元数据库的密码</description>
24   </property>
25   <property>
26     <name>hive.metastore.thrift.bind.host</name>
27     <value>localhost</value>
28   </property>
29   <property>
30     <name>hive.metastore.schema.verification</name>
31     <value>false</value>
32   </property>
33   <property>
34     <name>datanucleus.metadata.validate</name>
35     <value>false</value>
36   </property>
37   <property>
38     <name>datanucleus.schema.autoCreateAll</name>
39     <value>true</value>
40   <property>
41     <name>datanucleus.schema.autoCreateAll</name>
42     <value>true</value>
43   </property>
44   <property>
45     <name>hive.metastore.local</name>
46     <value>false</value>
47     <description>元数据库地址，非本地，这里用了MYSQL而非自带的Derby库
48     </description>
49   </property>
50 
51 </configuration>

1.3 CLI命令

如用户执行如下命令，可以查看简明说明的选项列表；

hadoop@master:/usr/local/hive-3.1.2$ bin/hive --help

　　Usage ./hive <parameters> --service serviceName <service parameters>
　　Service List: beeline cleardanglingscratchdir cli fixacidkeyindex help hiveburninclient hiveserver2 hplsql jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool strictmanagedmigration tokentool version
　　Parameters parsed:
　　--auxpath : Auxiliary jars
　　--config : Hive configuration directory
　　--service : Starts specific service/component. cli is default
　　Parameters used:
　　HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
　　HIVE_OPT : Hive options
　　For help on a particular service:
　　./hive --service serviceName --help
　　Debug help: ./hive --debug --help

命令行执行界面：

使用CLI选项：

1 hadoop@master:/usr/local/hive-3.1.2$ bin/hive --help --service cli

　　Hive Session ID = 89c8b7b1-4068-4ce0-a6fb-1e19be34fd0d
　　usage: hive
　　-d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B
　　--database <databasename> Specify the database to use
　　-e <quoted-query-string> SQL from command line
　　-f <filename> SQL from files
　　-H,--help Print help information
　　--hiveconf <property=value> Use value for given property
　　--hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B
　　-i <filename> Initialization SQL file
　　-S,--silent Silent mode in interactive shell
　　-v,--verbose Verbose mode (echo executed SQL to the console)

1.3.1 使用CLI选项，设置变量：

方法一：使用命令定义变量
hadoop@master:/usr/local/hive-3.1.2$ hive --define foo=bar
hive> set hivevar:foo;
hivevar:foo=bar
方法二：使用CLI方式定义变量
hive> set hivevar:foo=bar2;
hive> set foo;
foo=bar2

创建表

1 hive> create table toss1(i int,${hivevar:foo} string);
2 OK
3 Time taken: 1.998 seconds
4 
5 hive> desc toss1;
6 OK
7 i                       int                                         
8 bar2                    string                                      
9 Time taken: 0.508 seconds, Fetched: 2 row(s)

1.3.2 使用CLI执行命令，可以使用带参的-S开启静默模式：

 1 hadoop@master:/usr/local/hive-3.1.2$ hive -e "select * from tablename limit 3“
 2 
 3 OK
 4 name1 10
 5 name2 20
 6 name2 30
 7 Time taken: 4.663 seconds
 8 
 9 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "select * from tablename limit 3“
10 
11 name1 10
12 name2 20
13 name2 30

带参的-S开启静默模式：

1 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "select * from tablename limit 3“ > /tmp/myquery
2 
3 cat /tmp/myquery
4 
5 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "set" |　grep warehouse
6 hive.metastore.warehouse.dir=/usr/local/hive-3.1.2

其他命令模式：

执行文件

##执行查询文件
hadoop@master:/usr/local/hive-3.1.2$ hive -f /usr/local/queries.sql

##hive下导入数据文件
hadoop@master:/usr/local/hive-3.1.2$ hive -e "LOAD DATA LOCAL INPATH" '/usr/local' into table src ;

##执行SHELL命令
hive > ! echo "what up !" ;

##执行dfs命令
hive > dfs -ls / ;

查看历史执行命令可用 $HOME/.hivehistory文件中

二、数据定义

Hive中数据库的概念本质上仅仅是表的一个目录或命名空间。

1 hive> desc database finance;
2 OK
3 finance        hdfs://master:9000/user/hive/warehouse/finance.db    hadoop    USER    
4 Time taken: 0.053 seconds, Fetched: 1 row(s)

hdfs：//master:9000指定对应的服务器名和端口号

当DROP DATABASE时加上关键字CASCADE，这样可以使HIVE自行先删除数据库中的表：

1 hive> drop database if exists hive cascade;

查看表信息：

 1 hive> desc extended employees;
 2 OK
 3 name                    string                                      
 4 salary                  float                                       
 5 subordinates            array<string>                               
 6 deductions              map<string,float>                           
 7 address                 struct<street:string,city:string,state:string,zip:int>                        
 8           
 9 Detailed Table Information    
　　Table(tableName:employees, dbName:finance, owner:hadoop, createTime:1605448633, lastAccessTime:0, retention:0, sd:StorageDescriptor(
　　cols:[
　　FieldSchema(name:name, type:string, comment:null), 
　　FieldSchema(name:salary, type:float, comment:null), 
　　FieldSchema(name:subordinates, type:array<string>, comment:null), 
　　FieldSchema(name:deductions, type:map<string,float>, comment:null), 
　　FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)], 
　　location:hdfs://master:9000/user/hive/warehouse/finance.db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"deductions\":\"true\",\"name\":\"true\",\"salary\":\"true\",\"subordinates\":\"true\"}}, numFiles=0, transient_lastDdlTime=1605448633, bucketing_version=2, comment=descritiption of the table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER)    
10 Time taken: 0.149 seconds, Fetched: 7 row(s)

创建分区表

1 hive> create table student
2     > (
3     > name string,
4     > class string,
5     > score float
6     > )
7     > partitioned by (type string) ;
8 OK
9 Time taken: 0.101 seconds

插入数据：

1 hive> insert into student select 'zhangsan','chinese',99,'zh';

查看分区

1 hive> show partitoins student;

引用材料：

Hive3.1.2安装指南： http://dblab.xmu.edu.cn/blog/2440-2/

Hive metastore三种配置方式：https://blog.csdn.net/reesun/article/details/8556078

posted @ 2020-11-13 21:55 lenomail 阅读(191) 评论(0) 收藏举报

刷新页面返回顶部

lenomail