Hive学习笔记——常用语法

1.查看表的列表

show tables

2.创建表

多个字段的时候需要指定用什么来分隔

create table test(id int,name string)row format delimited fields terminated by '\t';
create table test(id int,name string)row format delimited fields terminated by ',';

3.插入数据

insert into table test values (1,'row1'),(2,'row2');

也可以select任意一张空表来insert

insert into table default.example_table (select 1L,'xiaodou',array(1L,2L,3L) from default.test limit 1);

如果遇到 Unable to create temp file for insert values Expression of type TOK_FUNCTION not supported in insert/values

insert into table test partition(ds="2019-08-20") select 1L,2,1S,1.111,"test",2Y,true,array(1,2,3),array(1Y,1Y,1Y),map('name','Xiao','age','20'),map(10L,false,20L,true),"lin","tong";

4.加载数据

也可以使用Hadoop fs -put命令直接上传文件,注意文件中的分隔符需要和创建表的时候指定分隔符保持一致

load data local inpath 'XXXX' into table XXXX;

如果是加载分区的数据的话

load data local inpath 'xxx/role_id=1' into table XXX PARTITION(role_id=1);

5.查询数据

select * from XXXX;

6.删除表

drop table XXXX;

7.hive优化

参考：hive调优

8.analyze table

参考

https://www.jianshu.com/p/7a2bd40a6632

9.统计hive表大小行数

http://wimperio.tk/2019/02/21/Hive-Table-Statistics/

10.查看hive表分区的location

describe formatted xxx.xxx partition (date="2021-07-17",hour="06")

11.查看hive表分区信息

 describe formatted xxx.xxx partition (pdate='2017-01-23');

12.hive表加字段

ALTER TABLE xxx.xxx ADD COLUMNS (xxx_count1 bigint);

13.hive表修改表结构

对于hive1和hive2，如果是parquet表修改字段类型产生冲突的时候，可能会报 Unable to alter table. The following columns have types incompatible with the existing columns in their respective positions

对于struct类型字段增加字段，也是一样的，所以只能删掉重建，而且在parquet表中struct类型中字段的顺序是没有要求的

hive3的话可能会有不同

alter table xxx.xxx replace columns(column_2 string);

14.hive表修改字段

对于struct类型字段增加字段，也是一样的，所以只能删掉重建，而且在parquet表中struct类型中字段的顺序是没有要求的

hive3的话可能会有不同

alter table xxx.xxx change aaaa bbbb string;

15.hive表修改location

alter table xxx.xxx set location 'hdfs://heracles/user/video-mvc/hive/warehouse/t_m_cc'

16.hive表修改分区location

ALTER TABLE `xxx.xxxx` PARTITION (pdate='2021-09-22',phour='00') SET LOCATION "s3a://xxxx/xxxx/2021-09-22/00";

17.调整字段位置

ALTER TABLE xxx.xxxx  CHANGE `xxxaaa` `xxxaaa` array<string>  after `xxxbbb`

18.HQL with语法

with a as (select * from test)　
select * from a;

19.HQL 解析json

select get_json_object(context, '$.aaa'),count(*) as num  from .xxx where pdate = "2021-09-29" group by get_json_object(context, '$.aaa') order by num

20.hive设置严格模式

设置hive.mapred.mode的值为strict

参考：踩过数据仓库hive的坑：hive设置严格模式

21.HQL COALESCE函数

COALESCE是一个函数， (expression_1, expression_2, ...,expression_n)依次参考各参数表达式，遇到非null值即停止并返回该值。如果所有的表达式都是空值，最终将返回一个空值。

22.修改hive tblproperties

alter table xx.xx set tblproperties('property_name'='new_value');

23.删除hive tblproperties

alter table xx.xx UNSET TBLPROPERTIES ('property_name');

24.hive的用户认证

http://lxw1234.com/archives/2016/01/600.htm

24.调整map和reduce任务的数量　　

set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
set mapred.map.tasks=10;

参考：【Hive任务优化】—— Map、Reduce数量调整

25.hive时间戳转换

select from_unixtime(unix_timestamp(str, "yyyy-MM-dd'T'HH:mm:ss"), 'yyyy-MM-dd HH:mm:ss') as xx_time from xx.xx;

26.hive直接写数据到HDFS或者S3路径下

INSERT OVERWRITE DIRECTORY 's3://xx/xx/xx' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
SELECT * FROM xx.xx where pdate="2022-08-06";

官方文档

https://sparkbyexamples.com/apache-hive/export-hive-table-into-csv-file-with-header/

如果要控制数据只有一个文件的话，可以设置reduce任务的数量为1，然后添加 distribute by rand() 强行触发reduce任务

set mapred.reduce.tasks=1;

INSERT OVERWRITE DIRECTORY 's3://xx/xx/xx' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' 
SELECT user_id FROM xx.xx where pdate="2022-08-06" 
distribute by rand();

参考：hive控制文件生成个数

27.如果hive的array数组判断不等于NULL不生效

先运行如下命令

set hive.cbo.enable=false;

posted @ 2015-07-23 21:29 tonglin0325 阅读(675) 评论(0) 收藏举报

刷新页面返回顶部

tonglin0325.github.io