clickHouse的SummingMergeTree引擎以及AggregatingMergeTree引擎使用介绍

一、SummingMergeTree

1.SummingMergeTree介绍


求和引擎继承自 MergeTree。区别在于,当合并 SummingMergeTree 表的数据片段时,ClickHouse 会把所有具有相同主键的行合并为一行,该行包含了被合并的行中具有数值数据类型的列的汇总值。如果主键的组合方式使得单个键值对应于大量的行,则可以显著的减少存储空间并加快数据查询的速度。

2.建表语句

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = SummingMergeTree([columns])
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[SETTINGS name=value, ...]

 

columns - 包含了将要被汇总的列的列名的元组。可选参数。
所选的列必须是数值类型,并且不可位于主键中。

3.使用示例

--本地表
create table test.summing_table_test1 
(
v1 Int32,
v2 Int32,
name String,
total_date DateTime
) ENGINE = SummingMergeTree((v1,v2))
order by (name)
partition by toDate(total_date)
SETTINGS index_granularity = 8192;

--写入测试数据:

insert into test.summing_table_test1
values (1,2,'a',now()),(2,2,'a',now()-1*60*60),(3,4,'b',now());

--强制合并
optimize table test.summing_table_test1  FINAL;
--查询数据:

SELECT *
FROM test.summing_table_test1

Query id: 2da82c96-2a90-496a-83fe-8a6528ba336c

┌─v1─┬─v2─┬─name─┬──────────total_date─┐
│ 34 │ a │ 2021-10-13 11:41:12 │
│ 34 │ b │ 2021-10-13 11:41:12 │
└────┴────┴──────┴─────────────────────┘

 

二、AggregatingMergeTree

1.AggregatingMergeTree 介绍
该表引擎继承自MergeTree,可以使用 AggregatingMergeTree 表来做增量数据统计聚合。如果要按一组规则来合并减少行数,则使用 AggregatingMergeTree 是合适的。AggregatingMergeTree是通过预先定义的聚合函数计算数据并通过二进制的格式存入表内。

是SummingMergeTree的加强版,SummingMergeTree能做的是对非主键列进行sum聚合,而AggregatingMergeTree则可以指定各种聚合函数

2.建表语句

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

 

3.使用示例

1)计算汇总员工工资

--建表:

CREATE TABLE emp_aggregatingmergeTree
(
emp_id UInt16 COMMENT '员工id',
name String COMMENT '员工姓名',
work_place String COMMENT '工作地点',
age UInt8 COMMENT '员工年龄',
depart String COMMENT '部门',
salary AggregateFunction(sum, Decimal32(2)) COMMENT '工资'
) ENGINE = AggregatingMergeTree() ORDER BY (emp_id, name) PRIMARY KEY emp_id PARTITION BY work_place;


ORDER BY (emp_id,name) -- 注意排序key是两个字段
PRIMARY KEY emp_id -- 主键是一个字段

--对于AggregateFunction类型的列字段,在进行数据的写入和查询时与其他的表引擎有很大区别,在写入数据时,需要调用-State函数;而在查询数据时,则需要调用相应的-Merge函数。对于上面的建表语句而言,需要使用sumState函数进行数据插入

-- 插入数据,
-- 注意:需要使用INSERT…SELECT语句进行数据插入
INSERT INTO TABLE emp_aggregatingmergeTree SELECT 1,'tom','上海',25,'信息部',sumState(toDecimal32(10000,2));
INSERT INTO TABLE emp_aggregatingmergeTree SELECT 1,'tom','上海',25,'信息部',sumState(toDecimal32(20000,2));
-- 查询数据
SELECT emp_id,name,sumMerge(salary) FROM emp_aggregatingmergeTree GROUP BY emp_id,name;
-- 结果输出
┌─emp_id─┬─name─┬─sumMerge(salary)─┐
│ 1 │ tom │ 30000.00 │
└────────┴──────┴──────────────────┘

--AggregatingMergeTree通常作为物化视图的表引擎,与普通MergeTree搭配使用。物化视图是作为其他数据表上层的一种查询视图。

 

-- 创建一个MereTree引擎的明细表
-- 用于存储全量的明细数据
-- 对外提供实时查询
CREATE TABLE emp_mergetree_base
(
emp_id UInt16 COMMENT '员工id',
name String COMMENT '员工姓名',
work_place String COMMENT '工作地点',
age UInt8 COMMENT '员工年龄',
depart String COMMENT '部门',
salary Decimal32(2) COMMENT '工资'
) ENGINE = MergeTree() ORDER BY (emp_id, name) PARTITION BY work_place;

-- 创建一张物化视图
-- 使用AggregatingMergeTree表引擎
CREATE MATERIALIZED VIEW view_emp_agg ENGINE = AggregatingMergeTree() PARTITION BY emp_id ORDER BY (emp_id, name) AS
SELECT emp_id, name, sumState(salary) AS salary
FROM emp_mergetree_base
GROUP BY emp_id, name;

-- 向基础明细表emp_mergetree_base插入数据
INSERT INTO emp_mergetree_base VALUES (1,'tom','上海',25,'技术部',20000),(1,'tom','上海',26,'人事部',10000);

-- 查询物化视图
SELECT emp_id,name,sumMerge(salary) FROM view_emp_agg GROUP BY emp_id,name;
-- 结果
┌─emp_id─┬─name─┬─sumMerge(salary)─┐
│ 1 │ tom │ 50000.00 │
└────────┴──────┴──────────────────┘

 2)展示每一个节点cpu 利用率的当前值

使用argMaxState 聚合列
create materialized view cpu_last_point_idle_mv 
engine = AggregatingMergeTree()
partition by tuple()
order by tags_id
populate
as select
argMaxState(create_date,created_at) as created_data,
maxState(create_at) as max_created_max,
argMaxState(time,created_at) as time,
tags_id,
argMaxState(usage_idle,created_at) as usage_idle
from cpu 
group by tags_id

argMax(a,b) 函数返回 b 最大值时 a的值

State 为聚合函数的后缀,聚合函数加此后缀不直接返回结果,返回聚合函数的中间结果,该中间结果可在AggregatingMergeTree 引擎中使用

使用Merge函数后缀得到聚合结果
create view cpu_last_point_idle_v as
select 
argMaxMerge(created_date) as created_date,
maxMerge(max_created_at) as created_at,
argMaxMerge(time) as time,
tags_id,
argMaxMerge(usage_idle) as usage_idle
from cpu_last_point_idle_mv
group by tags_id
查询结果视图
select 
tags_id,
100 - usage_idle usage
from cpu_last_point_idle_v
order by usage desc,tags_id asc
limit 10

3)创建一个跟踪tb_test_MergeTree_basic表的物化视图

create materialized view tb_test_AggregatingMergeTree_view ENGINE = AggregatingMergeTree() PARTITION BY (brandId,shopId) ORDER BY (brandId,shopId) as select brandId,shopId,sumState(saleMoney) saleMoney,sumState(saleQty) saleQty,countState(1) saleNum,uniqState(vipId)  vipNum from tb_test_MergeTree_basic group by brandId,shopId
b64d9704419c :) create materialized view tb_test_AggregatingMergeTree_view ENGINE = AggregatingMergeTree() PARTITION BY (brandId,shopId) ORDER BY (brandId,shopId) as select brandId,shopId,sumState(saleMoney) saleMoney,sumState(saleQty) saleQty,countState(1) saleNum,uniqState(vipId)  vipNum from tb_test_MergeTree_basic group by brandId,shopId
 
CREATE MATERIALIZED VIEW tb_test_AggregatingMergeTree_view
ENGINE = AggregatingMergeTree()
PARTITION BY (brandId, shopId)
ORDER BY (brandId, shopId) AS
SELECT 
    brandId, 
    shopId, 
    sumState(saleMoney) AS saleMoney, 
    sumState(saleQty) AS saleQty, 
    countState(1) AS saleNum, 
    uniqState(vipId) AS vipNum
FROM tb_test_MergeTree_basic
GROUP BY 
    brandId, 
    shopId
 
Ok.
 
0 rows in set. Elapsed: 0.012 sec. 
 
b64d9704419c :) 

show table  可见比普通表多了“.inner.”前缀

 

 

目录名称也比普通表多了一些类似乱码的字符

 建视图前已经存在的数据不能跟踪

tb_test_MergeTree_basic 表原来已经在创建物化视图的时候已经有数据了

b64d9704419c :) select * from tb_test_MergeTree_basic 
 
SELECT *
FROM tb_test_MergeTree_basic
 
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960022020-10-07200.54010002 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960022020-10-05200.51010001 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960012020-10-07200.53010003 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960022020-10-04200.54010001 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960012020-10-01200.51010001 │
│     42960012020-10-02200.52010002 │
│     42960012020-10-03200.53010003 │
│     42960012020-10-04200.51010001 │
│     42960012020-10-05200.52010001 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘
┌─brandId─┬─shopId─┬───saleDate─┬─saleMoney─┬─saleQty─┬─vipId─┐
│     42960012020-10-06200.53010003 │
└─────────┴────────┴────────────┴───────────┴─────────┴───────┘

执行一次optimize table tb_test_AggregatingMergeTree_view

再查tb_test_AggregatingMergeTree_view视图

b64d9704419c :) select * from tb_test_AggregatingMergeTree_view
 
SELECT *
FROM tb_test_AggregatingMergeTree_view
 
Ok.
 
0 rows in set. Elapsed: 0.003 sec. 
 
b64d9704419c :)

可见没有跟踪建表之前的已经存在的数据

可以跟踪建视图后再插入的数据

1)插入2条数据

insert into tb_test_MergeTree_basic values (429,6001,'2020-10-08 14:15:23',200.50,30,10003)
insert into tb_test_MergeTree_basic values (429,6002,'2020-10-08 14:15:23',200.50,40,10002)

2)查看

b64d9704419c :) select * from tb_test_AggregatingMergeTree_view
 
SELECT *
FROM tb_test_AggregatingMergeTree_view
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │         │         │ ³Gw     │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ (       │         │ $a6㞠  │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
2 rows in set. Elapsed: 0.008 sec. 
 
b64d9704419c :)

3)聚合结果

b64d9704419c :) select brandId,shopId,sumMerge(saleMoney) saleMoney,sumMerge(saleQty) saleQty,countMerge(saleNum) saleNum,uniqMerge(vipNum)  vipNum from tb_test_AggregatingMergeTree_view group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sumMerge(saleMoney) AS saleMoney, 
    sumMerge(saleQty) AS saleQty, 
    countMerge(saleNum) AS saleNum, 
    uniqMerge(vipNum) AS vipNum
FROM tb_test_AggregatingMergeTree_view
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002200.54011 │
│     4296001200.53011 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
2 rows in set. Elapsed: 0.005 sec. 
 
b64d9704419c :)

4)继续插入新的数据

insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,10,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,20,10002)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,30,10003)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,10,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,20,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-09 14:15:23',200.50,30,10003)
insert into tb_test_MergeTree_basic values (429,6002,'2020-10-09 14:15:23',200.50,40,10001)
insert into tb_test_MergeTree_basic values (429,6002,'2020-10-09 14:15:23',200.50,10,10001)
insert into tb_test_MergeTree_basic values (429,6001,'2020-10-10 14:15:23',200.50,10,10001)

 5)查看

可见分区没有合并

b64d9704419c :) select * from tb_test_AggregatingMergeTree_view
 
SELECT *
FROM tb_test_AggregatingMergeTree_view
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │         │         │ l 
                                                      򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │         │         │ ³Gw     │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ 
        │         │ l 
                     򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296001 │ i@        │ 
        │         │ l 
                     򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ (       │         │ $a6㞠  │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum───┐
│     4296001 │ T@        │ d       │         │ l 
                                                      󷥄³Gw │
└─────────┴────────┴───────────┴─────────┴─────────┴──────────┘
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002 │ i@        │ (       │         │ l 
                                                      򞞠 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
7 rows in set. Elapsed: 0.004 sec. 
 
b64d9704419c :)

6)观察自动跟踪

b64d9704419c :) select brandId,shopId,sumMerge(saleMoney) saleMoney,sumMerge(saleQty) saleQty,countMerge(saleNum) saleNum,uniqMerge(vipNum)  vipNum from tb_test_AggregatingMergeTree_view group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sumMerge(saleMoney) AS saleMoney, 
    sumMerge(saleQty) AS saleQty, 
    countMerge(saleNum) AS saleNum, 
    uniqMerge(vipNum) AS vipNum
FROM tb_test_AggregatingMergeTree_view
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─saleMoney─┬─saleQty─┬─saleNum─┬─vipNum─┐
│     4296002601.59032 │
│     4296001160416083 │
└─────────┴────────┴───────────┴─────────┴─────────┴────────┘
 
2 rows in set. Elapsed: 0.010 sec. 
 
b64d9704419c :) 

7)可见确实已经自动跟踪聚合了

但是只是创建视图后插入的数据才能跟踪,验证SQL 如下

select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic  where saleDate>='2020-10-08' group by brandId,shopId

执行结果

b64d9704419c :) select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic  where saleDate>='2020-10-08' group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sum(saleMoney), 
    sum(saleQty), 
    count(1), 
    uniq(vipId)
FROM tb_test_MergeTree_basic
WHERE saleDate >= '2020-10-08'
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─sum(saleMoney)─┬─sum(saleQty)─┬─count(1)─┬─uniq(vipId)─┐
│     4296002601.59032 │
│     4296001160416083 │
└─────────┴────────┴────────────────┴──────────────┴──────────┴─────────────┘
 
2 rows in set. Elapsed: 0.003 sec. 
 
b64d9704419c :) 

8)创建视图前原来已经存在的数据是不能被跟踪的

下面的这部分值不能被跟踪

select brandId,shopId,sum(saleMoney),sum(saleQty),count(1),uniq(vipId) from tb_test_MergeTree_basic  where saleDate<'2020-10-08' group by brandId,shopId
 
SELECT 
    brandId, 
    shopId, 
    sum(saleMoney), 
    sum(saleQty), 
    count(1), 
    uniq(vipId)
FROM tb_test_MergeTree_basic
WHERE saleDate < '2020-10-08'
GROUP BY 
    brandId, 
    shopId
 
┌─brandId─┬─shopId─┬─sum(saleMoney)─┬─sum(saleQty)─┬─count(1)─┬─uniq(vipId)─┐
│     4296002601.59032 │
│     42960011403.515073 │
└─────────┴────────┴────────────────┴──────────────┴──────────┴─────────────┘
 
2 rows in set. Elapsed: 0.003 sec. 
 
b64d9704419c :) 

 

4.总结

1)使用ORDER BY排序键作为聚合数据的依据
2)使用AggregateFunction字段类型定义聚合函数的类型以及聚合字段
3)只有在合并分区的时候才会触发聚合计算的逻辑
4)聚合只会发生在同分区内,不同分区的数据不会发生聚合
5)在进行数据计算时,因为同分区的数据已经基于ORDER BY排序,所以能够找到相邻且具有相同聚合key的数据
6)在聚合数据时,同一分区内,相同聚合key的多行数据会合并成一行,对于那些非主键、非AggregateFunction类型字段,则会取第一行数据
7)AggregateFunction类型字段使用二进制存储,在写入数据时,需要调用state函数;在读数据时,需要调用merge函数,*表示定义时使用的聚合函数
8)AggregateMergeTree通常作为物化视图的引擎,与普通的MergeTree搭配使用

5.注意

      可以使用AggregatingMergeTree表来做增量数据统计聚合,包括物化视图的数据聚合

1)AggregatingMergeTree表不能跟踪basic表,在执行inser select 之后查的数据无法进行聚合,只能inser select 之前的数据聚合
2)AggregatingMergeTree物化视图可以跟踪basic表,但是在视图创建前已经存在的数据不能被跟踪,只能跟踪聚合视图创建后新插入的数据

posted @ 2022-02-16 17:06  渐逝的星光  阅读(1838)  评论(4编辑  收藏  举报