hive是一个应用性很强的工具,没有高深的东西,就是熟练与活用。而越是活用的技能,个人觉得基础越要清晰与精准,于是把基础的东西MARK一遍。其次,通常情况下,HIIVE与mysql同时存在的,写了些异同和常见错误。以供参考。
1.基本语法
| | | |
|---|
| select A from B where C | hive用limit sql 用top | select user_name
from user_info
where city='beijing' and sex='female' limit 10 --查看前10个用户 |
| | 查看分区表 dt='2019-04-09' | select user_name,piece, pay_amount
from user_trade
where dt='2019-04-09' --查看分区表
and goods_category='food'; |
| Group by | 分类汇总 count();sum();avg();max();min(); | select goods_category,
count(distinct user_name) as user_num,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by goods_category; |
| | | |
| Order by | ASC:升序 12345 ,默认值 DESC:降序 多个字段 ORDER BY A ASC,BDESC | select user_name,
sum(pay_amount) as total_amount
from user_trade
where dt between '2019-01-01' and '2019-04-30'
group by total_amount DESC limit 5; |
| 执行顺序 | from->where->group by->having->select->order by | |
2.常用函数
| | | |
|---|
| 时间戳转化为日期 | | select pay_time,from_unix_time(pay_time,'yyyy-mm-dd hh:mm:ss')
from user_trade
where dt='2019-04-01'; |
| 计算日期间隔 | datediff(string enddate,string startdate) 天数=结束日期-开始日期 增加date_add(string startdate.int days) 减少date_sub(string startdate.int days) | select user_name,
datediff('2019-04-01',to_date(firstactivetime)
from user_info<br/>limit 10; |
| 条件函数 | case when age<20 第二句写age>=18 也会从20以后查询的。 | select case when age<20 then'20岁以下'
when age>=20 and age<30 then '20-30岁'
when age>=30 and age<40 then '30-40岁'
else '40岁以上' end ,
count(distinct user_id) user_num
from user_info
group by case when age<20 then'20岁以下'
when age>=20 and age<30 then '20-30岁'
when age>=30 and age<40 then '30-40岁'
else '40岁以上' end ; |
| | if | select sex,
if(level>5,'高','低'),-- 0 '高' 1'低'<br/>
count(distinct user_id) user_num
from user_info
group by sex,
if(level>5,'高','低'); |
| | | |
| 字符串函数 | 截取字符串 substr(string A,int start,int len) | select substr(fristtime,1,7) as month,
count() user_name
from user_info
group by substr(fristtime,1,7); |
| | 如果不指定长度,则一直截取到最后。 | |
| | 查询字符 string类型 get_json_object(extra1,'$.phonebrand') | select get_json_object(extra1,'$.phonebrand') as phone_brand,
count(distinct user_id) user_num
from user_info
group by get_json_object(extra1,'$.phonebrand'); |
| | 查询字符 map<string,string> extra2['phonebrand'] | select extra2['phonebrand']as phone_brand,
count(distinct user_id) user_num
from user_info
group by extra2['phonebrand']; |
| | | |
| note: | 不允许avg(count(*)) | |
| | datediff(max(pay_time),min(pay_time)) | |
3.联合查询
| | | HIVE | sql |
|---|
| 1.inner join | 1.必须重命名 2.on后面是唯一键值 3.inner可不写,效果一样 | select * from user_list_1 a join user_list_2 b on a.user_id=b.user_id; | select * from user_list_1 a join user_list_2 b on a.user_id=b.user_id; |
| 2表查询 | 1.左2017,右2018 2.先条件筛选,再连接 | select a.user_name
a.user_name=b.user_name from (select distinct user_name --distinct 去重
from user_trade -- where year(dt)=2019) a -- 分片
join (select distinct user_name from user_trade where year(dt)=2019) b on a.user_name=b.user_name; | |
| 3表查询 | 写法1 : 先join 再选择(表小的时候可以) | select distinct a.user_name from trade_2017 a join trade_2018 b on a.user_name=b.user_name join trade_2019 c on b.user_name=c.user_name ; | |
| | 写法2: 先去重再join | select a.user_name from (select distinct user_name --distinct 去重
from trade_2017) a join (select distinct user_name --distinct 去重
from trade_2018) b on a.user_name=b.user_name;
join (select distinct user_name --distinct 去重
from trade_2019) c on b.user_name=c.user_name; | |
| | | | |
2,left join | 左表为全集,返回能匹配上的右表2的匹配结果,没有匹配上 显示NULL | | |
| | HIVE不支持在in里加子查询 比如:在user_list_1 中但不在user_list_2中的用户 | select a.user_id,
a.user_name from user_list_1 a
LEFT JOIN user_list_2 b on a.user_id=b_user_id
where b_user_id is null; | select a.user_name
from user_list_1 a
where a.user_name not in (select user_name from user_list_2 ); |
| | 3表查询 | select a.user_name
from
(select distinct user_name from trade_2017) a
join
(select distinct user_name from trade_2018) b on a.user_name=b.user_name
left join
(select distinct user_name from trade_2019) c on b.user_name=c.user_name
where c.user_name is null;
| |
| | | | |
| 3.full join | 两个表合并在一起 coalesce() user_list1,user_list2的所有用户 | select coalesce(a.user_name,b.user_name)from user_list_1 a FULL JOIN user_list_2 b on a.user_id=b.user_id; | |
| 4.union all | 1.字段名必须一致 2.字段顺序必须一致 3.没有连接条件 | select user_id,user_name from user_list_1 UNION ALL select user_id,user_name from user_list_3 | |
| | | | |
| 5.union | | | |
| 6.union all , left join, full join 区别 | union all(追加) left join(加在右边) full join(全排列) | | |
| 7.union all,union区别 | union all不去重不排序 union 去重排序 | | |