9.ClickHouse系列之数据一致性保证
对于ReplacingMergeTree引擎,我们之前讲过在后台合并的时间不定,合并前查询会存在数据不一致问题。
CREATE TABLE study.house (
id String,
city String,
region String,
name String,
price Float32,
publish_date DateTime
) ENGINE=ReplacingMergeTree(publish_date) PARTITION BY toYYYYMMDD(publish_date) PRIMARY KEY(id) ORDER BY (id, city, region, name)
SETTINGS index_granularity=8192
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 60000, toDateTime('2022-05-01'));
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 60000, toDateTime('2022-05-01'));
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 61000, toDateTime('2022-06-01'));
INSERT INTO study.house VALUES (1, '上海', '静安', '场中小区', 61000, toDateTime('2022-06-01'));
查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
1,上海,静安,场中小区,60000,2022-05-01 00:00:00
1,上海,静安,场中小区,60000,2022-05-01 00:00:00
为保证数据一致性,有以下解决方案
1. 手动执行分区合并
optimize table study.house final;
查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
1,上海,静安,场中小区,60000,2022-05-01 00:00:00
可以看到,虽然id, city, region, name为order by的唯一键,但是price价格不一致,仍然会作为两个值进行保留
2. 通过Group By去重
SELECT id, city, region, name, argMax(price, publish_date), max(publish_date) FROM study.house GROUP BY id, city, region, name;
查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
3. 通过final查询
在查询语句后增加FINAL修饰符,这样查询过程中将会执行Merge的特殊逻辑。FINAL在新版本中,支持多线程执行,可以通过max_threads设置
EXPLAIN SELECT id, city, region, name, price, publish_date FROM study.house FINAL SETTINGS max_threads=2;
查询结果如下:
1,上海,静安,场中小区,61000,2022-06-01 00:00:00
浙公网安备 33010602011771号