GaussDB-最佳实践

GaussDB-最佳实践

反馈基数估计

  1. 数据库启动,开启参数enable_adaptive_cost,其他参数保持默认设置。

     

    set enable_adaptive_cost=on;
     

     

  2. 打开计时功能。

     

    \timing on
     

     

  3. 执行由于错误的基数估计导致次优计划的语句,如含有多种连接路径和连接类型的查询SQL,观察执行计划在多次执行中是否迭代出更优计划。以下面语句举例:

     

    EXPLAIN (ANALYZE, ADAPTCOST) SELECT MIN(mc.note) AS production_note,
    MIN(t.title) AS movie_title,
    MIN(t.production_year) AS movie_year
    FROM company_type AS ct,
    info_type AS it,
    movie_companies AS mc,
    movie_info_idx AS mi_idx,
    title AS t
    WHERE ct.kind = 'production companies'
    AND it.info = 'bottom 10 rank'
    AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
    AND t.production_year BETWEEN 2003 AND 2010
    AND ct.id = mc.company_type_id
    AND t.id = mc.movie_id
    AND t.id = mi_idx.movie_id
    AND mc.movie_id = mi_idx.movie_id
    AND it.id = mi_idx.info_type_id
     

     

  4. 执行三遍,第三遍执行计划相较第一遍性能提高1000倍左右。在NORMAL模式下可以用EXPLAIN(ADAPTCOST)关键词执行语句,查看每个算子对应的基数估计模型以及估计的基数。查看系统表gs_abo_model_statistic中对应模型的统计信息情况。

反馈代价矫正

  1. 数据库启动,开启参数enable_adaptive_cost。

     

    SET enable_adaptive_cost=on;
     

     

  2. 设置adaptive_costest_strategy为L1,即信任默认的基数估计。

     

    SET adaptive_costest_strategy="L1";
     

     

  3. 默认收集数据的滑动窗口长度设置为5(即训练集5,测试集5),TPCH执行2轮后可累计收集10以上个算子信息,即可纠正计划,获得1倍左右的性能提升。

     

    SET cost_update_window_size=5;
     

     

  4. 启动数据库,使用gs_costmodel_calibration_manual()手动触发代价收集和代价模型更新,执行SQL(来源于TPCH Q10,数据规模1GB数据量)。

     

    gaussdb=# SELECT gs_costmodel_calibration_manual();
    gaussdb=# EXPLAIN (analyze,adaptcost,buffers) SELECT
            c_custkey,
            c_name,
            sum(l_extendedprice * (1 - l_discount)) as revenue,
            c_acctbal,
            n_name,
            c_address,
            c_phone,
            c_comment
    FROM
            customer,
            orders,
            lineitem,
            nation
    WHERE
            c_custkey = o_custkey
            and l_orderkey = o_orderkey
            and o_orderdate >= date '1994-10-01'
            and o_orderdate < date '1994-10-01' + interval '3' month
            and l_returnflag = 'R'
            and c_nationkey = n_nationkey
    GROUP BY
            c_custkey,
            c_name,
            c_acctbal,
            c_phone,
            n_name,
            c_address,
            c_comment
    ORDER BY
            revenue desc
    LIMIT 100;
     
    • 原计划
      Limit
      ->  Sort
      Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - lineitem.l_discount)))) DESC
      Sort Method: top-N heapsort  Memory: 80kB
      ->  HashAggregate
      Group By Key: customer.c_custkey, customer.c_name, customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, customer.c_comment
      ->  Hash Join
      Hash Cond: (customer.c_nationkey = nation.n_nationkey)
      ->  Hash Join
      Hash Cond: (orders.o_custkey = customer.c_custkey)
      ->  Hash Join
      Hash Cond: (lineitem.l_orderkey = orders.o_orderkey)
      ->  Seq Scan on lineitem
      Filter: (l_returnflag = 'R'::bpchar)
      Rows Removed by Filter: 4522345
      ->  Hash
      Buckets: 65536  Batches: 1  Memory Usage: 2208kB
      ->  Seq Scan on orders  (cost=1.20..48704.00 rows=56336 width=8)
      Filter: ((o_orderdate >= '1995-01-01 00:00:00'::timestamp(0) without time zone) AND (o_orderdate < '1995-04-01 00:00:00'::timestamp without time zone))
      Rows Removed by Filter: 1443494
      ->  Hash
      Buckets: 262144  Batches: 1  Memory Usage: 26813kB
      ->  Seq Scan on customer
      ->  Hash
      Buckets: 32768  Batches: 1  Memory Usage: 2kB
      ->  Seq Scan on nation
      Total runtime: 5968.483 ms
       
    • 新计划
      Limit
      ->  Sort
      Sort Key: (sum((lineitem.l_extendedprice * (1::numeric - lineitem.l_discount)))) DESC
      Sort Method: top-N heapsort  Memory: 77kB
      ->  HashAggregate
      Group By Key: customer.c_custkey, customer.c_name, customer.c_acctbal, customer.c_phone, nation.n_name, customer.c_address, customer.c_comment
      ->  Hash Join
      Hash Cond: (customer.c_nationkey = nation.n_nationkey)
      ->  Hash Join
      Hash Cond: (orders.o_custkey = customer.c_custkey)
      ->  Nested Loop
      ->  Seq Scan on orders
      Filter: ((o_orderdate >= '1994-10-01 00:00:00'::timestamp(0) without time zone) AND (o_orderdate < '1995-01-01 00:00:00'::timestamp without time zone))
      Rows Removed by Filter: 1442781
      ->  Index Scan using lineitem_pkey on lineitem
      Index Cond: (l_orderkey = orders.o_orderkey)
      Filter: (l_returnflag = 'R'::bpchar)
      Rows Removed by Filter: 114403
      ->  Hash
      Buckets: 262144  Batches: 1  Memory Usage: 26813kB
      ->  Seq Scan on customer
      ->  Hash
      Buckets: 32768  Batches: 1  Memory Usage: 2kB
      ->  Seq Scan on nation
      Total runtime: 2826.489 ms
       

    原计划,总执行时间5968ms;新计划,总执行时间2622ms。

    当两表连接的基数在HashJoin和NestLoop的决策边界时,代价矫正可以起到效果,选择更好的连接方式,否则一般不改变计划。

     

 
posted @ 2024-11-18 14:46  jerrywang1983  阅读(34)  评论(0)    收藏  举报