Impala & Hive 使用复杂数据类型

1. 环境

CDH 5.16.1

2. Hive 使用复杂数据类型

2.1 数据格式

1       zhangsan:man    football,basketball
2       lisi:female     sing,dance

2.2 Hive 建表

create table studentInfo(
    id int,
    info map<string,string>  comment 'map<姓名,性别>',
    favorite array<string> comment 'array[football,basketball]'
)
row format delimited fields terminated by '\t'    --列分隔符
collection items terminated by ','   --array中各个item之间的分隔符
map keys terminated 
by ':'        --map中key和value之间的分隔符
lines terminated by '\n';       --行分隔符

2.3 导入数据

load data local inpath '/opt/module/jobs/student.txt' into table studentInfo;

2.3 执行查询

select *  from studentInfo;

+-----------------+---------------------+----------------------------+--+
| studentinfo.id  |  studentinfo.info   |    studentinfo.favorite    |
+-----------------+---------------------+----------------------------+--+
| 1               | {"zhangsan":"man"}  | ["football","basketball"]  |
| 2               | {"lisi":"female"}   | ["sing","dance"]           |
+-----------------+---------------------+----------------------------+--+




-- 对于map查询,map[key]
--对于array查询,array[index]
select id, info['zhangsan'],favorite[1] from studentInfo;

+-----+-------+-------------+--+
| id  |  sex  |  favorite   |
+-----+-------+-------------+--+
| 1   | man   | basketball  |
| 2   | NULL  | dance       |
+-----+-------+-------------+--+

3. Impala 使用复杂类型

注意:Impala 只用parquet格式存储时,才能使用复杂数据类型

3.1 Hive中建表(parquet格式,导入数据

create table student_parquet(
    id int,
    info map<string,string>  comment 'map<姓名,性别>',
    favorite array<string> comment 'array[football,basketball]'
)
stored as parquet

insert overwrite table student_parquet select id,info,favorite from studentInfo;

3.2 刷新impala元数据

refresh default.student_parquet;

3.3 执行查询

select 
    id ,favorite_array.item,info_map.key,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array;

+----+------------+----------+--------+
| id | item       | key      | value  |
+----+------------+----------+--------+
| 1  | football   | zhangsan | man    |
| 1  | basketball | zhangsan | man    |
| 2  | sing       | lisi     | female |
| 2  | dance      | lisi     | female |
+----+------------+----------+--------+




select 
    id ,favorite_array.item
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.POS = 0;

+----+----------+
| id | item     |
+----+----------+
| 1  | football |
| 2  | sing     |
+----+----------+




select 
    id ,favorite_array.item,info_map.value
from student_parquet,
    student_parquet.info as info_map,
    student_parquet.favorite as favorite_array
where favorite_array.item = 'sing'
and info_map.key = 'lisi';

+----+------+--------+
| id | item | value  |
+----+------+--------+
| 2  | sing | female |
+----+------+--------+

总结:

  1. array 类型视为 一张表, 其列名为 item

  2. map类型有两个列, 一个是key, 一个是value

参考:

  1. https://blog.csdn.net/rav009/article/details/86750850
  2. https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/impala_complex_types.html
posted @ 2020-02-12 01:01  大数据小码农  阅读(2030)  评论(0编辑  收藏  举报