Hive SQL 中的分号、下划线等tips整理

转载自:https://ixyzero.com/blog/archives/5741.html,感谢原作者的总结分享。

=Start=

缘由:

简单整理一下近期在进行 Hive SQL 查询时碰到的一些问题和经验,方便后面有需要的时候参考。

正文:

参考解答:

1. Hive SQL 如何对英文分号(;)做split切分?

> select split(f1,'\073')[0] from tab_name;

-- 使用 '\073' 来代替 ';' 这种写法就OK,否则会报语法错误。

-- 验证代码片段
with t1 as (
SELECT
'1\0733' as v1
,'1 3' as v2
,'123' as v3
)

SELECT
v1,split(v1,'\073') as result1
,v2,split(v2,'\073| ') as result2
,v3,split(v3,'\073| ') as result3
FROM
t1
,split(lower(mac_address),'\073| ') as mac_addr_array --对mac地址字符串使用空格和分号作为分隔符进行切分

,array_contains(mac_addr_array,user_mac) as it_asset -- 判断特定的mac地址是否在mac地址数组中

2. Hive SQL的like如何匹配下划线?

SQL中如何使用like匹配下划线的内容?因为下划线在like中代表任意单个字符,如果需要匹配下划线字符,需要对下划线进行转义。

WHERE mycolumn LIKE '%\_%' ESCAPE '\'
WHERE mycolumn LIKE '%\_%'

WHERE mycolumn LIKE '%#_%' ESCAPE '#'

方法一:
使用 escape 对下划线进行转义(默认是反划线)

方法二:
使用 instr/locate/rlike 等函数进行判断,不使用 like 来判断
-- 验证代码片段

with t1 as (
SELECT
'1_3' as underscore_var
,'123' as no_underscore_var
)

SELECT
underscore_var, if(underscore_var like '%\_%',1,0) as has1
,no_underscore_var, if(no_underscore_var like '%\_%',1,0) as has2
,if(no_underscore_var like '%_%',1,0) as wrong_judge_method
FROM
t1

3. Hive中 布尔类型(boolean) 的变量如何做比较?

-- 简单来说就是可以用常规的等号(=),也可以不添加判等符号
Correct syntax is:
where boolean_col = True
where boolean_col
where NOT boolean_col

To check for NULL use:
where boolean_col is NULL

4. Hive中如何向array/map等复杂类型字段中添加空值?

-- 暂未实际测试

Prepare:
CREATE TABLE IF NOT EXISTS tmp.test_table (
 col1 STRING,
 col2 MAP<STRING, STRING>,
 col3 ARRAY<BIGINT>
)
PARTITIONED BY ( ds STRING )
;

Step1: Add brickhouse jar and create temporary functions to cast array and map.

> ADD JAR s3a://airbnb-datainfra-dependencies-internal-only/teams/datainfra/projects/hive_aux_jars/brickhouse-0.7.1.jar;
> CREATE TEMPORARY FUNCTION CAST_ARRAY AS 'brickhouse.udf.collect.CastArrayUDF';
> CREATE TEMPORARY FUNCTION CAST_MAP AS 'brickhouse.udf.collect.CastMapUDF';

Step 2: Insert the data:

> INSERT OVERWRITE TABLE tmp.test_table PARTITION(ds='2020–01–01')
> SELECT 
 'random_text' col1
 , CAST_MAP(map('', ''), 'map<string,string>') col2
 , CAST_ARRAY(array(), 'bigint') col3

Step 3: Check the results:
> SELECT * FROM tmp.test_table;
参考链接:

split string that includes semicolons in Hive
https://stackoverflow.com/questions/17212511/split-string-that-includes-semicolons-in-hive

Why does using an Underscore character in a LIKE filter give me all the results?
https://stackoverflow.com/questions/19588455/why-does-using-an-underscore-character-in-a-like-filter-give-me-all-the-results

【Hive】like + 转义字符
https://blog.csdn.net/qq_34105362/article/details/80415526

HIVE Boolean Filter
https://stackoverflow.com/questions/68313045/hive-boolean-filter

Insert NULL into Hive complex columns like ARRAY, MAP etc.
https://medium.com/@rajnishkumargarg/insert-null-into-hive-complex-columns-like-array-map-etc-a76e320d3e7e
https://github.com/jeromebanks/brickhouse/blob/master/src/main/java/brickhouse/udf/collect/CastMapUDF.java

=END=

 
posted @ 2025-02-14 09:10  sinodragon21  阅读(91)  评论(0)    收藏  举报