Hive笔记

Hive学习笔记

安装部分(略):

http://dblab.xmu.edu.cn/blog/2630-2/

一、Hive基础

1.1 内部是什么

HIVE二进制分支版本核心包括3个部分,

$HIVE_HOME/lib  :JAR库

$HIVE_HOME/bin :执行的各种可执行文件

$HIVE_HOME/conf :配置文件

其他包括:metastoreservice(元数据服务)、Thrift服务(远程访问)、HWI(网页界面)

1.2 文件配置

通常情况下,会有二个文件需要配置,一个是hive-site.xml、另外一个是hive-default.xml。

step1:hive-default.xml文件在$HIVE_HOME/conf目录下会有典型的模板,只需重命名即可:

1 cd $HIVE_HOME2 
2 mv hive-default.xml.template hive-default.xml

step2:使用VIM创建hive-site.xml文件,如没有:

1 vim hive-site.xml

编辑文件

 1 <?xml version="1.0" encoding="UTF-8" standalone="no"?>
 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
 3 
 4 <configuration>
 5   <property>
 6     <name>javax.jdo.option.ConnectionURL</name>
 7     <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value>
 8     <description>元数据配置库,这里用的是本地MYSQL库,可以选择其他配置方式,参见“Hive metastore三种配置方式” </description>
 9   </property>
10   <property>
11     <name>javax.jdo.option.ConnectionDriverName</name>
12     <value>com.mysql.jdbc.Driver</value>
13     <description>元数据驱动</description>
14   </property>
15   <property>
16     <name>javax.jdo.option.ConnectionUserName</name>
17     <value>hive</value>
18    <description>元数据库的用户名</description>
19   </property>
20   <property>
21     <name>javax.jdo.option.ConnectionPassword</name>
22     <value>1qaz2wsx3edc</value>
23     <description>元数据库的密码</description>
24   </property>
25   <property>
26     <name>hive.metastore.thrift.bind.host</name>
27     <value>localhost</value>
28   </property>
29   <property>
30     <name>hive.metastore.schema.verification</name>
31     <value>false</value>
32   </property>
33   <property>
34     <name>datanucleus.metadata.validate</name>
35     <value>false</value>
36   </property>
37   <property>
38     <name>datanucleus.schema.autoCreateAll</name>
39     <value>true</value>
40   <property>
41     <name>datanucleus.schema.autoCreateAll</name>
42     <value>true</value>
43   </property>
44   <property>
45     <name>hive.metastore.local</name>
46     <value>false</value>
47     <description>元数据库地址,非本地,这里用了MYSQL而非自带的Derby库
48     </description>
49   </property>
50 
51 </configuration>

1.3 CLI命令

如用户执行如下命令,可以查看简明说明的选项列表;

hadoop@master:/usr/local/hive-3.1.2$ bin/hive --help

  Usage ./hive <parameters> --service serviceName <service parameters>
  Service List: beeline cleardanglingscratchdir cli fixacidkeyindex help hiveburninclient hiveserver2 hplsql jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool strictmanagedmigration tokentool version
  Parameters parsed:
  --auxpath : Auxiliary jars
  --config : Hive configuration directory
  --service : Starts specific service/component. cli is default
  Parameters used:
  HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
  HIVE_OPT : Hive options
  For help on a particular service:
  ./hive --service serviceName --help
  Debug help: ./hive --debug --help

命令行执行界面:

使用CLI选项:

1 hadoop@master:/usr/local/hive-3.1.2$ bin/hive --help --service cli

  Hive Session ID = 89c8b7b1-4068-4ce0-a6fb-1e19be34fd0d
  usage: hive
  -d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B
  --database <databasename> Specify the database to use
  -e <quoted-query-string> SQL from command line
  -f <filename> SQL from files
  -H,--help Print help information
  --hiveconf <property=value> Use value for given property
  --hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B
  -i <filename> Initialization SQL file
  -S,--silent Silent mode in interactive shell
  -v,--verbose Verbose mode (echo executed SQL to the console)

1.3.1 使用CLI选项,设置变量:

方法一:使用命令定义变量
hadoop@master:/usr/local/hive-3.1.2$ hive --define foo=bar hive> set hivevar:foo; hivevar:foo=bar 方法二:使用CLI方式定义变量
hive
> set hivevar:foo=bar2; hive> set foo; foo=bar2

创建表

1 hive> create table toss1(i int,${hivevar:foo} string);
2 OK
3 Time taken: 1.998 seconds
4 
5 hive> desc toss1;
6 OK
7 i                       int                                         
8 bar2                    string                                      
9 Time taken: 0.508 seconds, Fetched: 2 row(s)

1.3.2 使用CLI执行命令,可以使用带参的-S开启静默模式:

 1 hadoop@master:/usr/local/hive-3.1.2$ hive -e "select * from tablename limit 3“
 2 
 3 OK
 4 name1 10
 5 name2 20
 6 name2 30
 7 Time taken: 4.663 seconds
 8 
 9 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "select * from tablename limit 3“
10 
11 name1 10
12 name2 20
13 name2 30

带参的-S开启静默模式:

1 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "select * from tablename limit 3“ > /tmp/myquery
2 
3 cat /tmp/myquery
4 
5 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "set" | grep warehouse
6 hive.metastore.warehouse.dir=/usr/local/hive-3.1.2

 其他命令模式:

执行文件

##执行查询文件
hadoop@master:/usr/local/hive-3.1.2$ hive -f /usr/local/queries.sql

##hive下导入数据文件
hadoop@master:/usr/local/hive-3.1.2$ hive -e "LOAD DATA LOCAL INPATH" '/usr/local' into table src ;

##执行SHELL命令
hive > ! echo "what up !" ;

##执行dfs命令
hive > dfs -ls / ;

查看历史执行命令可用 $HOME/.hivehistory文件中

二、数据定义

Hive中数据库的概念本质上仅仅是表的一个目录或命名空间。

1 hive> desc database finance;
2 OK
3 finance        hdfs://master:9000/user/hive/warehouse/finance.db    hadoop    USER    
4 Time taken: 0.053 seconds, Fetched: 1 row(s)

hdfs://master:9000指定对应的服务器名和端口号

当DROP DATABASE时加上关键字CASCADE,这样可以使HIVE自行先删除数据库中的表:

1 hive> drop database if exists hive cascade;

查看表信息:

 1 hive> desc extended employees;
 2 OK
 3 name                    string                                      
 4 salary                  float                                       
 5 subordinates            array<string>                               
 6 deductions              map<string,float>                           
 7 address                 struct<street:string,city:string,state:string,zip:int>                        
 8           
 9 Detailed Table Information    
  Table(tableName:employees, dbName:finance, owner:hadoop, createTime:1605448633, lastAccessTime:0, retention:0, sd:StorageDescriptor(
  cols:[
  FieldSchema(name:name, type:string, comment:null),
  FieldSchema(name:salary, type:float, comment:null),
  FieldSchema(name:subordinates, type:array<string>, comment:null),
  FieldSchema(name:deductions, type:map<string,float>, comment:null),
  FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)],
  location:hdfs://master:9000/user/hive/warehouse/finance.db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"deductions\":\"true\",\"name\":\"true\",\"salary\":\"true\",\"subordinates\":\"true\"}}, numFiles=0, transient_lastDdlTime=1605448633, bucketing_version=2, comment=descritiption of the table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER) 10 Time taken: 0.149 seconds, Fetched: 7 row(s)

创建分区表

1 hive> create table student
2     > (
3     > name string,
4     > class string,
5     > score float
6     > )
7     > partitioned by (type string) ;
8 OK
9 Time taken: 0.101 seconds

插入数据:

1 hive> insert into student select 'zhangsan','chinese',99,'zh';

查看分区

1 hive> show partitoins student;

 

引用材料:

Hive3.1.2安装指南 : http://dblab.xmu.edu.cn/blog/2440-2/

Hive metastore三种配置方式 :https://blog.csdn.net/reesun/article/details/8556078

posted @ 2020-11-13 21:55  lenomail  阅读(191)  评论(0)    收藏  举报