HIVE(2)-基础用法

hive是一个应用性很强的工具，没有高深的东西，就是熟练与活用。而越是活用的技能，个人觉得基础越要清晰与精准，于是把基础的东西MARK一遍。其次，通常情况下，HIIVE与mysql同时存在的，写了些异同和常见错误。以供参考。

1.基本语法


select A from B where C	hive用limit sql 用top	`select user_name from user_info where city='beijing' and sex='female' limit 10 --查看前10个用户`
	查看分区表 dt='2019-04-09'	`select user_name,piece, pay_amount from user_trade where dt='2019-04-09' --查看分区表 and goods_category='food';`
Group by	分类汇总 count();sum();avg();max();min();	`select goods_category, count(distinct user_name) as user_num, sum(pay_amount) as total_amount from user_trade where dt between '2019-01-01' and '2019-04-30' group by goods_category;`

Order by	ASC:升序 12345 ，默认值 DESC:降序多个字段 ORDER BY A ASC,BDESC	`select user_name, sum(pay_amount) as total_amount from user_trade where dt between '2019-01-01' and '2019-04-30' group by total_amount DESC limit 5;`
执行顺序	from->where->group by->having->select->order by

2.常用函数


时间戳转化为日期		`select pay_time,from_unix_time(pay_time,'yyyy-mm-dd hh:mm:ss') from user_trade where dt='2019-04-01';`
计算日期间隔	datediff(string enddate,string startdate) 天数=结束日期-开始日期增加date_add(string startdate.int days) 减少date_sub（string startdate.int days）	`select user_name, datediff('2019-04-01',to_date(firstactivetime) from user_info<br/>limit 10;`
条件函数	case when age<20 第二句写age>=18 也会从20以后查询的。	`select case when age<20 then'20岁以下' when age>=20 and age<30 then '20-30岁' when age>=30 and age<40 then '30-40岁' else '40岁以上' end , count(distinct user_id) user_num from user_info group by case when age<20 then'20岁以下' when age>=20 and age<30 then '20-30岁' when age>=30 and age<40 then '30-40岁' else '40岁以上' end ;`
	if	`select sex, if(level>5,'高','低')，-- 0 '高' 1'低'<br/> count(distinct user_id) user_num from user_info group by sex, if(level>5,'高','低');`

字符串函数	截取字符串 substr（string A，int start,int len）	`select substr(fristtime,1,7) as month, count() user_name from user_info group by substr(fristtime,1,7);`
	如果不指定长度，则一直截取到最后。
	查询字符 string类型 get_json_object(extra1,'$.phonebrand')	`select get_json_object(extra1,'$.phonebrand') as phone_brand, count(distinct user_id) user_num from user_info group by get_json_object(extra1,'$.phonebrand');`
	查询字符 map<string,string> extra2['phonebrand']	`select extra2['phonebrand']as phone_brand, count(distinct user_id) user_num from user_info group by extra2['phonebrand'];`

note:	不允许avg(count(*))
	datediff(max(pay_time),min(pay_time))

3.联合查询

		HIVE	sql
1.inner join	1.必须重命名 2.on后面是唯一键值 3.inner可不写，效果一样	`select * from user_list_1 a join user_list_2 b on a.user_id=b.user_id;`	select * from user_list_1 a join user_list_2 b on a.user_id=b.user_id;
2表查询	1.左2017，右2018 2.先条件筛选，再连接	`select a.user_name a.user_name=b.user_name from (select distinct user_name --distinct 去重 from user_trade -- where year(dt)=2019) a -- 分片 join (select distinct user_name from user_trade where year(dt)=2019) b on a.user_name=b.user_name;`
3表查询	写法1 ：先join 再选择（表小的时候可以）	`select distinct a.user_name from trade_2017 a join trade_2018 b on a.user_name=b.user_name join trade_2019 c on b.user_name=c.user_name ;`
	写法2：先去重再join	`select a.user_name from (select distinct user_name --distinct 去重 from trade_2017) a join (select distinct user_name --distinct 去重 from trade_2018) b on a.user_name=b.user_name; join (select distinct user_name --distinct 去重 from trade_2019) c on b.user_name=c.user_name;`

2，left join	左表为全集,返回能匹配上的右表2的匹配结果，没有匹配上显示NULL
	HIVE不支持在in里加子查询比如：在user_list_1 中但不在user_list_2中的用户	`select a.user_id, a.user_name from user_list_1 a LEFT JOIN user_list_2 b on a.user_id=b_user_id where b_user_id is null;`	`select a.user_name from user_list_1 a where a.user_name not in (select user_name from user_list_2 );`
	3表查询	`select a.user_name from (select distinct user_name from trade_2017) a join (select distinct user_name from trade_2018) b on a.user_name=b.user_name left join (select distinct user_name from trade_2019) c on b.user_name=c.user_name where c.user_name is null;`

3.full join	两个表合并在一起 coalesce() user_list1,user_list2的所有用户	`select coalesce(a.user_name,b.user_name)from user_list_1 a FULL JOIN user_list_2 b on a.user_id=b.user_id;`
4.union all	1.字段名必须一致 2.字段顺序必须一致 3.没有连接条件	`select user_id,user_name from user_list_1 UNION ALL select user_id,user_name from user_list_3`

5.union
6.union all , left join, full join 区别	union all（追加） left join(加在右边) full join（全排列）
7.union all，union区别	union all不去重不排序 union 去重排序

posted @ 2020-05-17 21:56 jasmineTang 阅读(119) 评论(0) 收藏举报

刷新页面返回顶部