GaussDB-最佳实践
GaussDB-最佳实践
反馈基数估计
- 数据库启动,开启参数enable_adaptive_cost,其他参数保持默认设置。
set enable_adaptive_cost=on;
- 打开计时功能。
\timing on
- 执行由于错误的基数估计导致次优计划的语句,如含有多种连接路径和连接类型的查询SQL,观察执行计划在多次执行中是否迭代出更优计划。以下面语句举例:
EXPLAIN (ANALYZE, ADAPTCOST) SELECT MIN(mc.note) AS production_note, MIN(t.title) AS movie_title, MIN(t.production_year) AS movie_year FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info_idx AS mi_idx, title AS t WHERE ct.kind = 'production companies' AND it.info = 'bottom 10 rank' AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%' AND t.production_year BETWEEN 2003 AND 2010 AND ct.id = mc.company_type_id AND t.id = mc.movie_id AND t.id = mi_idx.movie_id AND mc.movie_id = mi_idx.movie_id AND it.id = mi_idx.info_type_id
- 执行三遍,第三遍执行计划相较第一遍性能提高1000倍左右。在NORMAL模式下可以用EXPLAIN(ADAPTCOST)关键词执行语句,查看每个算子对应的基数估计模型以及估计的基数。查看系统表gs_abo_model_statistic中对应模型的统计信息情况。
反馈代价矫正
- 数据库启动,开启参数enable_adaptive_cost。
SET enable_adaptive_cost=on;
- 设置adaptive_costest_strategy为L1,即信任默认的基数估计。
SET adaptive_costest_strategy="L1";
- 默认收集数据的滑动窗口长度设置为5(即训练集5,测试集5),TPCH执行2轮后可累计收集10以上个算子信息,即可纠正计划,获得1倍左右的性能提升。
SET cost_update_window_size=5;
- 启动数据库,使用gs_costmodel_calibration_manual()手动触发代价收集和代价模型更新,执行SQL(来源于TPCH Q10,数据规模1GB数据量)。
gaussdb=# SELECT gs_costmodel_calibration_manual(); gaussdb=# EXPLAIN (analyze,adaptcost,buffers) SELECT c_custkey, c_name, sum(l_extendedprice * (1 - l_discount)) as revenue, c_acctbal, n_name, c_address, c_phone, c_comment FROM customer, orders, lineitem, nation WHERE c_custkey = o_custkey and l_orderkey = o_orderkey and o_orderdate >= date '1994-10-01' and o_orderdate < date '1994-10-01' + interval '3' month and l_returnflag = 'R' and c_nationkey = n_nationkey GROUP BY c_custkey, c_name, c_acctbal, c_phone, n_name, c_address, c_comment ORDER BY revenue desc LIMIT 100;
- 原计划
Limit -> Sort Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - lineitem.l_discount)))) DESC Sort Method: top-N heapsort Memory: 80kB -> HashAggregate Group By Key: customer.c_custkey, customer.c_name, customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, customer.c_comment -> Hash Join Hash Cond: (customer.c_nationkey = nation.n_nationkey) -> Hash Join Hash Cond: (orders.o_custkey = customer.c_custkey) -> Hash Join Hash Cond: (lineitem.l_orderkey = orders.o_orderkey) -> Seq Scan on lineitem Filter: (l_returnflag = 'R'::bpchar) Rows Removed by Filter: 4522345 -> Hash Buckets: 65536 Batches: 1 Memory Usage: 2208kB -> Seq Scan on orders (cost=1.20..48704.00 rows=56336 width=8) Filter: ((o_orderdate >= '1995-01-01 00:00:00'::timestamp(0) without time zone) AND (o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone)) Rows Removed by Filter: 1443494 -> Hash Buckets: 262144 Batches: 1 Memory Usage: 26813kB -> Seq Scan on customer -> Hash Buckets: 32768 Batches: 1 Memory Usage: 2kB -> Seq Scan on nation Total runtime: 5968.483 ms
- 新计划
Limit -> Sort Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - lineitem.l_discount)))) DESC Sort Method: top-N heapsort Memory: 77kB -> HashAggregate Group By Key: customer.c_custkey, customer.c_name, customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, customer.c_comment -> Hash Join Hash Cond: (customer.c_nationkey = nation.n_nationkey) -> Hash Join Hash Cond: (orders.o_custkey = customer.c_custkey) -> Nested Loop -> Seq Scan on orders Filter: ((o_orderdate >= '1994-10-01 00:00:00'::timestamp(0) without time zone) AND (o_orderdate < '1995-01-01 00:00:00'::timestamp without time zone)) Rows Removed by Filter: 1442781 -> Index Scan using lineitem_pkey on lineitem Index Cond: (l_orderkey = orders.o_orderkey) Filter: (l_returnflag = 'R'::bpchar) Rows Removed by Filter: 114403 -> Hash Buckets: 262144 Batches: 1 Memory Usage: 26813kB -> Seq Scan on customer -> Hash Buckets: 32768 Batches: 1 Memory Usage: 2kB -> Seq Scan on nation Total runtime: 2826.489 ms
原计划,总执行时间5968ms;新计划,总执行时间2622ms。
当两表连接的基数在HashJoin和NestLoop的决策边界时,代价矫正可以起到效果,选择更好的连接方式,否则一般不改变计划。
- 原计划