HIVE(2)-基础用法

   hive是一个应用性很强的工具,没有高深的东西,就是熟练与活用。而越是活用的技能,个人觉得基础越要清晰与精准,于是把基础的东西MARK一遍。其次,通常情况下,HIIVE与mysql同时存在的,写了些异同和常见错误。以供参考。
 

1.基本语法

   
select A from B where Chive用limit
sql 用top
 select user_name 
from user_info 
where city='beijing' and sex='female' limit 10    --查看前10个用户

 

 查看分区表 dt='2019-04-09'
select user_name,piece, pay_amount 
from user_trade 
where dt='2019-04-09' --查看分区表

and goods_category='food';

 

Group by分类汇总 count();sum();avg();max();min();
select goods_category, 
count(distinct user_name) as user_num, 
sum(pay_amount) as total_amount 
from user_trade 
where dt between '2019-01-01' and '2019-04-30'
group by goods_category;

 

   
Order byASC:升序 12345 ,默认值 DESC:降序 多个字段 ORDER BY A ASC,BDESC
select user_name,
 sum(pay_amount) as total_amount
 from user_trade
  where dt between '2019-01-01' and '2019-04-30'
  group by total_amount DESC limit 5;

 

执行顺序from->where->group by->having->select->order by 

2.常用函数

   
时间戳转化为日期 
select pay_time,from_unix_time(pay_time,'yyyy-mm-dd hh:mm:ss')
from user_trade
where dt='2019-04-01';

 

计算日期间隔

datediff(string enddate,string startdate) 天数=结束日期-开始日期
增加date_add(string startdate.int days)

减少date_sub(string startdate.int days)

select user_name,
 datediff('2019-04-01',to_date(firstactivetime)
from user_info<br/>limit 10;
条件函数case when
age<20
第二句写age>=18 也会从20以后查询的。
select case when age<20 then'20岁以下'
              when age>=20 and age<30 then '20-30岁'
              when age>=30 and age<40 then '30-40岁'
	              else '40岁以上' end ,
  count(distinct user_id) user_num
from user_info
group by case when age<20 then'20岁以下'
	              when age>=20 and age<30 then '20-30岁'
	              when age>=30 and age<40 then '30-40岁'
	              else '40岁以上' end ;

 

 if
select sex,
       if(level>5,'高','低'),-- 0 '高' 1'低'<br/>        
           count(distinct user_id) user_num
       from user_info
       group by sex,
       if(level>5,'高','低');

 

   
字符串函数

截取字符串

substr(string A,int start,int len)

select substr(fristtime,1,7) as month,
      count() user_name
      from user_info
      group by substr(fristtime,1,7);

 

 如果不指定长度,则一直截取到最后。 
 查询字符 string类型
get_json_object(extra1,'$.phonebrand')
select get_json_object(extra1,'$.phonebrand') as phone_brand,
       count(distinct user_id) user_num
       from user_info
       group by get_json_object(extra1,'$.phonebrand');

 

 查询字符 map<string,string>
extra2['phonebrand']
select extra2['phonebrand']as phone_brand,
       count(distinct user_id) user_num
 from user_info
 group by extra2['phonebrand'];

 

   
note:不允许avg(count(*)) 
 datediff(max(pay_time),min(pay_time)) 

 

3.联合查询

  HIVEsql

1.inner join

1.必须重命名
2.on后面是唯一键值
3.inner可不写,效果一样
select * from user_list_1 a join user_list_2 b on a.user_id=b.user_id;

 

select * from user_list_1 a join user_list_2 b on a.user_id=b.user_id;
2表查询1.左2017,右2018 2.先条件筛选,再连接
select a.user_name 
a.user_name=b.user_name from (select distinct user_name --distinct 去重 
from user_trade -- where year(dt)=2019) a -- 分片 
join (select distinct user_name  from user_trade where year(dt)=2019) b on a.user_name=b.user_name;

 

 
3表查询写法1 :
先join 再选择(表小的时候可以)
select distinct a.user_name from trade_2017 a join trade_2018 b on a.user_name=b.user_name join trade_2019 c on b.user_name=c.user_name ;

 

 
 

写法2:

先去重再join

select a.user_name from (select distinct user_name --distinct 去重 
from trade_2017) a join (select distinct user_name --distinct 去重 
from trade_2018) b on a.user_name=b.user_name; 
join (select distinct user_name                    --distinct 去重 
from trade_2019) c on b.user_name=c.user_name;

 

 
    

2,left join
左表为全集,返回能匹配上的右表2的匹配结果,没有匹配上 显示NULL  
 HIVE不支持在in里加子查询 比如:在user_list_1 中但不在user_list_2中的用户
select a.user_id,      
 a.user_name  from  user_list_1 a 
 LEFT JOIN user_list_2 b on a.user_id=b_user_id
where b_user_id is null;

 

select  a.user_name
from user_list_1 a  
where a.user_name not in (select user_name from user_list_2 );

 

 3表查询
select  a.user_name
from 
     (select distinct user_name from trade_2017) a     
join 
      (select distinct user_name from trade_2018) b on a.user_name=b.user_name 
left join 
      (select distinct user_name from trade_2019) c on b.user_name=c.user_name              
where c.user_name is null;

 

 
    
3.full join 两个表合并在一起 coalesce() user_list1,user_list2的所有用户
select coalesce(a.user_name,b.user_name)from user_list_1 a FULL JOIN user_list_2 b on a.user_id=b.user_id;

 

 
4.union all

1.字段名必须一致

2.字段顺序必须一致

3.没有连接条件

select user_id,user_name from user_list_1 UNION ALL select user_id,user_name from user_list_3

 

 
    
5.union   

6.union all ,

left join,

full join 区别

union all(追加) left join(加在右边) full join(全排列)  
7.union all,union区别union all不去重不排序 union 去重排序  

 

posted @ 2020-05-17 21:56  jasmineTang  阅读(119)  评论(0)    收藏  举报