Hive的union和join操作

建表语句：

create table tb_in_base

(

id bigint,

devid bigint,

devname string

) partitioned by (job_time bigint) row format delimited fields terminated by ',';

create table tb_in_up

(

id bigint,

devid bigint,

devname string

) partitioned by (job_time bigint) row format delimited fields terminated by ',';

场景一：单表子查询没有指定表别名

语句：select * from (select id,devid,job_time from tb_in_base) ;

执行过程：

提示需要指定子查询源。

加上表别名：

语句：select * from (select id,devid,job_time from tb_in_base) a;

执行过程：

加了表别名后可以正常输出子查询中的数据。

结果分析：在hive中若有子查询必须指定子查询的表别名

场景二：单表查询外围字段比子查询少一个

语句： select id,devid from (select id,devid,job_time from tb_in_base) a;

执行过程：

结果分析：输出外围指定字段的数据。

场景三：两张表进行union all

语句：

select a.id,a.devid from (select a.id,a.devid,a.job_time from tb_in_base a union all select b.id,b.devid,b.job_time from tb_in_up b) a;

执行过程：

结果分析：两张表进行union all 取相同的字段名称，可正常输出指定数据内容，且结果为两张表的结果集

场景四：两张表进行union

语句：

select a.id,a.devid from (select a.id,a.devid,a.job_time from tb_in_base a union all select b.id,b.devid,b.job_time from tb_in_up b) a;

执行过程：

结果分析：hive 不支持union

场景五：外围使用count、sum 统计id

语句：

select count(a.id),sum(a.id) from (select a.id,a.devid,a.job_time from tb_in_base a union all select b.id,b.devid,b.job_time from tb_in_up b) a;

执行过程：

结果分析：两表直接进行union all 可以使用count、sum 等聚合函数

场景六：union all 时使用count、sum 、max等聚合函数

结果分析：union all 时不能使用count、sum 、max等聚合函数，单表可以进行聚合函数使用，如下图

场景七：left join 是否可以使用max、count、sum 等函数

语句：

select max(a.id),min(b.id),sum(a.job_time),count(a.id) from tb_in_base a join tb_in_up b on (a.id=b.id);

执行过程：

结果分析：在left join 中可以使用max、count等聚合函数。

总结分析

1. 子查询相当于表名，使用 from 关键字需要指定真实表名或表别名。

2. hive 不支持union ，只支持union all

3. 子查询中使用union all 时，在子查询里不能使用count、sum 等聚合函数

4. 两表直接进行union all 可以使用count、sum 等聚合函数

5. 两张表进行union all 取相同的字段名称，可正常输出指定数据内容，且结果为两张表的结果集

posted @ 2018-10-29 15:12 白开水加糖阅读(10282) 评论(0) 收藏举报

刷新页面返回顶部

白开水加糖

巧者劳，智者忧，唯无能者无所求。

Hive的union和join操作

公告