GaussDB-最佳实践

反馈基数估计

数据库启动，开启参数enable_adaptive_cost，其他参数保持默认设置。
```
set enable_adaptive_cost=on;
```
打开计时功能。
```
\timing on
```

执行由于错误的基数估计导致次优计划的语句，如含有多种连接路径和连接类型的查询SQL，观察执行计划在多次执行中是否迭代出更优计划。以下面语句举例：

EXPLAIN (ANALYZE, ADAPTCOST) SELECT MIN(mc.note) AS production_note,
MIN(t.title) AS movie_title,
MIN(t.production_year) AS movie_year
FROM company_type AS ct,
info_type AS it,
movie_companies AS mc,
movie_info_idx AS mi_idx,
title AS t
WHERE ct.kind = 'production companies'
AND it.info = 'bottom 10 rank'
AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
AND t.production_year BETWEEN 2003 AND 2010
AND ct.id = mc.company_type_id
AND t.id = mc.movie_id
AND t.id = mi_idx.movie_id
AND mc.movie_id = mi_idx.movie_id
AND it.id = mi_idx.info_type_id

执行三遍，第三遍执行计划相较第一遍性能提高1000倍左右。在NORMAL模式下可以用EXPLAIN（ADAPTCOST）关键词执行语句，查看每个算子对应的基数估计模型以及估计的基数。查看系统表gs_abo_model_statistic中对应模型的统计信息情况。

反馈代价矫正

数据库启动，开启参数enable_adaptive_cost。
```
SET enable_adaptive_cost=on;
```
设置adaptive_costest_strategy为L1，即信任默认的基数估计。
```
SET adaptive_costest_strategy="L1";
```
默认收集数据的滑动窗口长度设置为5（即训练集5，测试集5），TPCH执行2轮后可累计收集10以上个算子信息，即可纠正计划，获得1倍左右的性能提升。
```
SET cost_update_window_size=5;
```

启动数据库，使用gs_costmodel_calibration_manual()手动触发代价收集和代价模型更新，执行SQL（来源于TPCH Q10，数据规模1GB数据量）。

gaussdb=# SELECT gs_costmodel_calibration_manual();
gaussdb=# EXPLAIN (analyze,adaptcost,buffers) SELECT
        c_custkey,
        c_name,
        sum(l_extendedprice * (1 - l_discount)) as revenue,
        c_acctbal,
        n_name,
        c_address,
        c_phone,
        c_comment
FROM
        customer,
        orders,
        lineitem,
        nation
WHERE
        c_custkey = o_custkey
        and l_orderkey = o_orderkey
        and o_orderdate >= date '1994-10-01'
        and o_orderdate < date '1994-10-01' + interval '3' month
        and l_returnflag = 'R'
        and c_nationkey = n_nationkey
GROUP BY
        c_custkey,
        c_name,
        c_acctbal,
        c_phone,
        n_name,
        c_address,
        c_comment
ORDER BY
        revenue desc
LIMIT 100;

原计划

Limit
->  Sort
Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - lineitem.l_discount)))) DESC
Sort Method: top-N heapsort  Memory: 80kB
->  HashAggregate
Group By Key: customer.c_custkey, customer.c_name, customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, customer.c_comment
->  Hash Join
Hash Cond: (customer.c_nationkey = nation.n_nationkey)
->  Hash Join
Hash Cond: (orders.o_custkey = customer.c_custkey)
->  Hash Join
Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
->  Seq Scan on lineitem
Filter: (l_returnflag = 'R'::bpchar)
Rows Removed by Filter: 4522345
->  Hash
Buckets: 65536  Batches: 1  Memory Usage: 2208kB
->  Seq Scan on orders  (cost=1.20..48704.00 rows=56336 width=8)
Filter: ((o_orderdate >= '1995-01-01 00:00:00'::timestamp(0) without time zone) AND (o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1443494
->  Hash
Buckets: 262144  Batches: 1  Memory Usage: 26813kB
->  Seq Scan on customer
->  Hash
Buckets: 32768  Batches: 1  Memory Usage: 2kB
->  Seq Scan on nation
Total runtime: 5968.483 ms

新计划

Limit
->  Sort
Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - lineitem.l_discount)))) DESC
Sort Method: top-N heapsort  Memory: 77kB
->  HashAggregate
Group By Key: customer.c_custkey, customer.c_name, customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, customer.c_comment
->  Hash Join
Hash Cond: (customer.c_nationkey = nation.n_nationkey)
->  Hash Join
Hash Cond: (orders.o_custkey = customer.c_custkey)
->  Nested Loop
->  Seq Scan on orders
Filter: ((o_orderdate >= '1994-10-01 00:00:00'::timestamp(0) without time zone) AND (o_orderdate < '1995-01-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 1442781
->  Index Scan using lineitem_pkey on lineitem
Index Cond: (l_orderkey = orders.o_orderkey)
Filter: (l_returnflag = 'R'::bpchar)
Rows Removed by Filter: 114403
->  Hash
Buckets: 262144  Batches: 1  Memory Usage: 26813kB
->  Seq Scan on customer
->  Hash
Buckets: 32768  Batches: 1  Memory Usage: 2kB
->  Seq Scan on nation
Total runtime: 2826.489 ms

原计划，总执行时间5968ms；新计划，总执行时间2622ms。

当两表连接的基数在HashJoin和NestLoop的决策边界时，代价矫正可以起到效果，选择更好的连接方式，否则一般不改变计划。

更多详情请参考GaussDB 文档中心：https://doc.hcs.huawei.com/db/zh-cn/gaussdbqlh/24.1.30/productdesc/qlh_03_0001.html

posted @ 2024-11-18 14:46 jerrywang1983 阅读(34) 评论(0) 收藏举报

刷新页面返回顶部

jerrywang1983

GaussDB-最佳实践

GaussDB-最佳实践

反馈基数估计

反馈代价矫正

公告