Hive笔记
Hive学习笔记
安装部分(略):
http://dblab.xmu.edu.cn/blog/2630-2/
一、Hive基础
1.1 内部是什么
HIVE二进制分支版本核心包括3个部分,
$HIVE_HOME/lib :JAR库
$HIVE_HOME/bin :执行的各种可执行文件
$HIVE_HOME/conf :配置文件
其他包括:metastoreservice(元数据服务)、Thrift服务(远程访问)、HWI(网页界面)
1.2 文件配置
通常情况下,会有二个文件需要配置,一个是hive-site.xml、另外一个是hive-default.xml。
step1:hive-default.xml文件在$HIVE_HOME/conf目录下会有典型的模板,只需重命名即可:
1 cd $HIVE_HOME2 2 mv hive-default.xml.template hive-default.xml
step2:使用VIM创建hive-site.xml文件,如没有:
1 vim hive-site.xml
编辑文件
1 <?xml version="1.0" encoding="UTF-8" standalone="no"?> 2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> 3 4 <configuration> 5 <property> 6 <name>javax.jdo.option.ConnectionURL</name> 7 <value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false</value> 8 <description>元数据配置库,这里用的是本地MYSQL库,可以选择其他配置方式,参见“Hive metastore三种配置方式” </description> 9 </property> 10 <property> 11 <name>javax.jdo.option.ConnectionDriverName</name> 12 <value>com.mysql.jdbc.Driver</value> 13 <description>元数据驱动</description> 14 </property> 15 <property> 16 <name>javax.jdo.option.ConnectionUserName</name> 17 <value>hive</value> 18 <description>元数据库的用户名</description> 19 </property> 20 <property> 21 <name>javax.jdo.option.ConnectionPassword</name> 22 <value>1qaz2wsx3edc</value> 23 <description>元数据库的密码</description> 24 </property> 25 <property> 26 <name>hive.metastore.thrift.bind.host</name> 27 <value>localhost</value> 28 </property> 29 <property> 30 <name>hive.metastore.schema.verification</name> 31 <value>false</value> 32 </property> 33 <property> 34 <name>datanucleus.metadata.validate</name> 35 <value>false</value> 36 </property> 37 <property> 38 <name>datanucleus.schema.autoCreateAll</name> 39 <value>true</value> 40 <property> 41 <name>datanucleus.schema.autoCreateAll</name> 42 <value>true</value> 43 </property> 44 <property> 45 <name>hive.metastore.local</name> 46 <value>false</value> 47 <description>元数据库地址,非本地,这里用了MYSQL而非自带的Derby库 48 </description> 49 </property> 50 51 </configuration>
1.3 CLI命令
如用户执行如下命令,可以查看简明说明的选项列表;
hadoop@master:/usr/local/hive-3.1.2$ bin/hive --help
Usage ./hive <parameters> --service serviceName <service parameters>
Service List: beeline cleardanglingscratchdir cli fixacidkeyindex help hiveburninclient hiveserver2 hplsql jar lineage llapdump llap llapstatus metastore metatool orcfiledump rcfilecat schemaTool strictmanagedmigration tokentool version
Parameters parsed:
--auxpath : Auxiliary jars
--config : Hive configuration directory
--service : Starts specific service/component. cli is default
Parameters used:
HADOOP_HOME or HADOOP_PREFIX : Hadoop install directory
HIVE_OPT : Hive options
For help on a particular service:
./hive --service serviceName --help
Debug help: ./hive --debug --help
命令行执行界面:
使用CLI选项:
1 hadoop@master:/usr/local/hive-3.1.2$ bin/hive --help --service cli
Hive Session ID = 89c8b7b1-4068-4ce0-a6fb-1e19be34fd0d
usage: hive
-d,--define <key=value> Variable substitution to apply to Hive commands. e.g. -d A=B or --define A=B
--database <databasename> Specify the database to use
-e <quoted-query-string> SQL from command line
-f <filename> SQL from files
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the console)
1.3.1 使用CLI选项,设置变量:
方法一:使用命令定义变量
hadoop@master:/usr/local/hive-3.1.2$ hive --define foo=bar hive> set hivevar:foo; hivevar:foo=bar 方法二:使用CLI方式定义变量
hive> set hivevar:foo=bar2; hive> set foo; foo=bar2
创建表
1 hive> create table toss1(i int,${hivevar:foo} string); 2 OK 3 Time taken: 1.998 seconds 4 5 hive> desc toss1; 6 OK 7 i int 8 bar2 string 9 Time taken: 0.508 seconds, Fetched: 2 row(s)
1.3.2 使用CLI执行命令,可以使用带参的-S开启静默模式:
1 hadoop@master:/usr/local/hive-3.1.2$ hive -e "select * from tablename limit 3“ 2 3 OK 4 name1 10 5 name2 20 6 name2 30 7 Time taken: 4.663 seconds 8 9 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "select * from tablename limit 3“ 10 11 name1 10 12 name2 20 13 name2 30
带参的-S开启静默模式:
1 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "select * from tablename limit 3“ > /tmp/myquery
2
3 cat /tmp/myquery
4
5 hadoop@master:/usr/local/hive-3.1.2$ hive -S -e "set" | grep warehouse
6 hive.metastore.warehouse.dir=/usr/local/hive-3.1.2
其他命令模式:
执行文件
##执行查询文件 hadoop@master:/usr/local/hive-3.1.2$ hive -f /usr/local/queries.sql ##hive下导入数据文件 hadoop@master:/usr/local/hive-3.1.2$ hive -e "LOAD DATA LOCAL INPATH" '/usr/local' into table src ; ##执行SHELL命令 hive > ! echo "what up !" ;
##执行dfs命令
hive > dfs -ls / ;
查看历史执行命令可用 $HOME/.hivehistory文件中
二、数据定义
Hive中数据库的概念本质上仅仅是表的一个目录或命名空间。
1 hive> desc database finance; 2 OK 3 finance hdfs://master:9000/user/hive/warehouse/finance.db hadoop USER 4 Time taken: 0.053 seconds, Fetched: 1 row(s)
hdfs://master:9000指定对应的服务器名和端口号
当DROP DATABASE时加上关键字CASCADE,这样可以使HIVE自行先删除数据库中的表:
1 hive> drop database if exists hive cascade;
查看表信息:
1 hive> desc extended employees; 2 OK 3 name string 4 salary float 5 subordinates array<string> 6 deductions map<string,float> 7 address struct<street:string,city:string,state:string,zip:int> 8 9 Detailed Table Information
Table(tableName:employees, dbName:finance, owner:hadoop, createTime:1605448633, lastAccessTime:0, retention:0, sd:StorageDescriptor(
cols:[
FieldSchema(name:name, type:string, comment:null),
FieldSchema(name:salary, type:float, comment:null),
FieldSchema(name:subordinates, type:array<string>, comment:null),
FieldSchema(name:deductions, type:map<string,float>, comment:null),
FieldSchema(name:address, type:struct<street:string,city:string,state:string,zip:int>, comment:null)],
location:hdfs://master:9000/user/hive/warehouse/finance.db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=0, numRows=0, rawDataSize=0, COLUMN_STATS_ACCURATE={\"BASIC_STATS\":\"true\",\"COLUMN_STATS\":{\"address\":\"true\",\"deductions\":\"true\",\"name\":\"true\",\"salary\":\"true\",\"subordinates\":\"true\"}}, numFiles=0, transient_lastDdlTime=1605448633, bucketing_version=2, comment=descritiption of the table}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER) 10 Time taken: 0.149 seconds, Fetched: 7 row(s)
创建分区表
1 hive> create table student 2 > ( 3 > name string, 4 > class string, 5 > score float 6 > ) 7 > partitioned by (type string) ; 8 OK 9 Time taken: 0.101 seconds
插入数据:
1 hive> insert into student select 'zhangsan','chinese',99,'zh';
查看分区
1 hive> show partitoins student;
引用材料:
Hive3.1.2安装指南 : http://dblab.xmu.edu.cn/blog/2440-2/
Hive metastore三种配置方式 :https://blog.csdn.net/reesun/article/details/8556078

浙公网安备 33010602011771号