hive优化1-列转行

数据需求说明:

table1的数据是 客户id,基金差额,保险差额,理财差额,。。。

create table youhua1(
    -> custid int
    -> ,jijin_change float
    -> ,baoxian_change float
    -> ,cunkuan_change float)
    -> ;

insert into youhua1 values(1,1.1,1.2,1.3),(2,2.67,2.34,2.1),(3,3.1,3.4,3.78);

select * from youhua1;
+--------+--------------+----------------+----------------+
| custid | jijin_change | baoxian_change | cunkuan_change |
+--------+--------------+----------------+----------------+
|      1 |          1.1 |            1.2 |            1.3 |
|      2 |         2.67 |           2.34 |            2.1 |
|      3 |          3.1 |            3.4 |           3.78 |
+--------+--------------+----------------+----------------+

原查询思路:

select 
     custid 
    ,"基金" as prod_name
    ,jijin_change as change
from youhua1

union all
select 
     custid 
    ,"保险" as prod_name
    ,baoxian_change change
from youhua1

union all
select 
     custid 
    ,"存款" as prod_name
    ,cunkuan_change as change
from youhua1;

目标数据:客户1,基金,差额(基金)
                  客户1,保险,差额(保险)

                  ...

+--------+-----------+------+
| custid | prod_name |change|
+--------+-----------+------+
|      1 | 基金      |  1.1 |
|      2 | 基金      | 2.67 |
|      3 | 基金      |  3.1 |
|      1 | 保险      |  1.2 |
|      2 | 保险      | 2.34 |
|      3 | 保险      |  3.4 |
|      1 | 存款      |  1.3 |
|      2 | 存款      |  2.1 |
|      3 | 存款      | 3.78 |
+--------+-----------+------+

优化方案:

当数据量很大时,这种方式会造成多次读表。

可以采用from前置的方式;

也可以先通过concat函数拼接jijin_change、baoxian_change、cunkuan_change三个字段成一列,然后再用列转行函数lateral view explode函数处理。

行转列:一个字段的多行数据合进一个列,通常可用collect_set+concat_ws;列转行:一个字段的一列数据拆到多个行,参考:https://blog.csdn.net/weixin_42003671/article/details/88132666

select custid,concat_ws(',',jijin_change,baoxian_change,cunkuan_change) from youhua1;  #这是mysql处理,hive中concat_ws里类型要转string
+--------+-----------------------------------------------------------+
| custid | concat_ws(',',jijin_change,baoxian_change,cunkuan_change) |
+--------+-----------------------------------------------------------+
|      1 | 1.1,1.2,1.3                                               |
|      2 | 2.67,2.34,2.1                                             |
|      3 | 3.1,3.4,3.78                                              |
+--------+-----------------------------------------------------------+

 以下为hive处理:

select 
     a.custid
    ,change 
from (
     select
           custid
          ,concat_ws(',',cast(jijin_change as string),cast(baoxian_change as string),cast(cunkuan_change as string)) as all_change
     from youhua1
    ) as a
lateral view explode(split(a.all_change,',')) table_tmp as change;
1    1.1
1    1.2
1    1.3
2    2.67
2    2.34
2    2.1
3    3.1
3    3.4
3    3.78

发现没有加上基金、保险、存款的行标识:

可以在concat_ws之前先将基金、保险、存款的行标识和值打在一起,concat_ws("+","基金"+"jijin_change"),...

在explode时指定split(a.all_change, ','),最后select custid,split(change,"+")[0] as prod_name,split(change,"+")[1] as change

报bug,用“+”没有办法拆开,会报错,把+替换成'\+'、'/+'都不行,改换为‘-’

select 
     a.custid
    ,split(change,'-')[0] as prod_name
    ,split(change,'-')[1] as change    
from (
     select
           custid
          ,concat_ws(',',concat_ws('-','基金',cast(jijin_change as string)),concat_ws('-','保险',cast(baoxian_change as string)),concat_ws('-','存款',cast(cunkuan_change as string))) as all_change
     from youhua1
    ) as a
lateral view explode(split(a.all_change,',')) table_tmp as change;
1    基金    1.1
1    保险    1.2
1    存款    1.3
2    基金    2.67
2    保险    2.34
2    存款    2.1
3    基金    3.1
3    保险    3.4
3    存款    3.78

 “+”号的方式:

concat_ws用“\+”,split用\\+“”

select 
     a.custid
    ,split(change,'\\+')[0] as prod_name
    ,split(change,'\\+')[1] as change    
from (
     select
           custid
          ,concat_ws(',',concat_ws('\+','基金',cast(jijin_change as string)),concat_ws('\+','保险',cast(baoxian_change as string)),concat_ws('\+','存款',cast(cunkuan_change as string))) as all_change
     from youhua1
    ) as a
lateral view explode(split(a.all_change,',')) table_tmp as change;

 

posted @ 2020-12-16 23:51  foolangirl  阅读(775)  评论(0编辑  收藏  举报