clickhouse常见的三种空值问题以及解决方案

1.建表时的空值问题
如果我们建表时,不特殊说明空值,比如:

CREATE TABLE test.table1(
  id String,
  name String
) ENGINE = MergeTree PARTITION BY id ORDER BY id SETTINGS index_granularity = 8192


这种情况下,如果将包含空值的数据,读入到表中时,会报错。

DB::Exception: Expression returns value NULL, that is out of range of type String, at: null)
因此,为了防止这种情况的发生,我们一般会这样建表:

注意:这里的主键是不可以包含空值的,如果把主键也加Nullable会报错

CREATE TABLE test.table1(
  id String,
  name Nullable(String)
) ENGINE = MergeTree PARTITION BY id ORDER BY id SETTINGS index_granularity = 8192

 

2.查询时的空值问题
上面说了建表的问题,接下来要实例一下,当我们表已经建好,且表数据已经有了,一列数据既包含null,又包含''这类空值,这个时候,如果不注意语法,会报错,如果包含这两类数据,不能使用coalesce,如下:

SELECT COALESCE
    ( paymentterm, 0 ) AS paymentterm_a,
    count( DISTINCT orderid ) AS ornum 
FROM
    ckdb.test 
WHERE
    d = '2020-05-08' 
GROUP BY
    paymentterm_a

报错如下:错误原因是paymentterm是string类型,不可以改成int类型

Code: 386, e.displayText() = DB::Exception: There is no supertype for types String, UInt8 because some of them are String/FixedString and some of them are not (version 19.17.6.36 (official build))
这里有一个小的知识点:

group by后面的名称,可以写select中的逻辑,也可以写as为的别名,下面使用case when改写上面的内容:

--方式一
select case when paymentterm is null or paymentterm = '' then 'null' else paymentterm end as paymentterm,
       count(distinct orderid) as ornum
  from ckdb.test
 where d = '2020-05-08'
 group by paymentterm
 
--方式二
select case when paymentterm is null or paymentterm = '' then 'null' else paymentterm end as paymentterm,
       count(distinct orderid) as ornum
  from ckdb.test
 where d = '2020-05-08'
 group by case when paymentterm is null or paymentterm = '' then 'null' else paymentterm end
 
--方式三
select coalesce(paymentterm,'null') as paymentterm,
       count(distinct orderid) as ornum
  from ckdb.test
 where d = '2020-05-08'
 group by coalesce(paymentterm,'null')
 

 

3.关联中的空值问题
如下场景,需要使用a表关联b表,把a和b都有的id剔除,在hive中我们一般这样实现:

select a.*
from a
left join b
on a.id = b.id
where b.id is null

不过这种方式,在CK中是有问题的,未连接的行使用默认值填充的。String类型就填充空字符串,数值类型就填充 0,要借用其他方式解决

1)使用coalesce来完成

select a.*
from a
left join b
on a.id = b.id
where coalesce(b.id,0) = 0

2)使用 settings join_use_nulls 来完成

修改参数,在 SQL 最后加入 settings join_use_nulls = 1

select * from st_center.test_join_1  as t1
all left join st_center.test_join_2  as t2
on t1.id = t2.id
settings join_use_nulls = 1

 注意:关于jdbc相关调用方式,导致settings无法应用,可以设置users.xml,

<join_use_nulls>1</join_use_nulls>

设置完成后,相关系统表内容也会更新,后续使用不用再单独指定。(可指定profiles组,注意使用账号所属profiles组)

放置位置参考:

 

SELECT *
FROM system.settings
WHERE name = 'join_use_nulls'

Query id: b5f9d42e-6e02-47ed-b261-fb04562ba18b

┌─name───────────┬─value─┬─changed─┬─description─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─min──┬─max──┬─readonly─┬─type─┐
│ join_use_nulls │ 11Use NULLs for non-joined rows of outer JOINs for types that can be inside Nullable. If false, use default value of corresponding columns data type. │ ???? │ ???? │        0 │ Bool │
└────────────────┴───────┴─────────┴─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴──────┴──────┴──────────┴──────┘

1 rows in set. Elapsed: 0.003 sec.

 

posted @ 2022-02-16 15:00  渐逝的星光  阅读(5861)  评论(0编辑  收藏  举报