PostgreSQL执行计划解析(下)

PostgreSQL执行计划解析(下)

Whoami:5年+金融、政府、医疗领域工作经验的DBA
Certificate:OCP、PCP
Skill:Oracle、Mysql、PostgreSQL
Platform:CSDN、墨天伦、公众号(呆呆的私房菜)

业务范围:数据库安装部署、日常维护、主备切换、故障处理、性能优化、技术培训等。
需要的伙伴或者商业合作请移步 公众号【呆呆的私房菜】获取联系方式。

阅读本文可以了解PostgreSQL数据库优化器相关技术,了解如何帮助优化器生成更高效的执行计划。

01 前言

上一篇文章《PostgreSQL执行计划解析(上)》,我们介绍了SQL执行过程、优化器及其如何查看执行计划等内容。SQL调优最常用的手段是使用explain查看执行计划然后再进行分析,这个过程中我们往往只关注了执行计划的结果,而缺少深入理解执行计划是如何生成的。

所以这篇文章我们来学习一下优化器的内容!优化器是数据库的核心功能,了解优化器如何进行查询优化可以帮助我们更好地去优化SQL。

02 查询重写

SQL执行过程包含解析、重写、优化、执行及其返回结果。在PostgreSQL数据库中,DDL语句无需优化,会直接到utility模块中处理;但对于DML语句,优化器会进行查询优化,生成最优执行计划后再执行SQL。

查询重写是指在解析查询之后,重写系统会对查询进行转换和简化。其中包含视图展开、变量重叠、子查询消除等。查询重写在于将查询转换成更简单、更容易优化的形式。我们可以通过如下案例来了解一下:

1. 视图展开:指的是在执行涉及视图的查询时,数据库查询优化器将视图的SQL定义展开成实际的查询语句
testdb=> CREATE TABLE employees (
testdb(>     employee_id INT PRIMARY KEY,
testdb(>     department_name VARCHAR(255),
testdb(>     salary INT
testdb(> );
CREATE TABLE
testdb=> INSERT INTO employees (employee_id, department_name, salary) VALUES
testdb-> (1, 'HR', 50000),
testdb-> (2, 'Engineering', 60000),
testdb-> (3, 'HR', 55000),
testdb-> (4, 'Marketing', 55000);
INSERT 0 4
testdb=> CREATE VIEW hr_employees AS
testdb-> SELECT employee_id, salary
testdb-> FROM employees
testdb-> WHERE department_name = 'HR';
CREATE VIEW
testdb=> SELECT * FROM hr_employees WHERE salary > 52000;
 employee_id | salary
-------------+--------
           3 |  55000
(1 行记录)
testdb=> explain (analyze,buffers) SELECT * FROM hr_employees WHERE salary > 52000;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Seq Scan on employees  (cost=0.00..12.10 rows=1 width=8) (actual time=0.013..0.014 rows=1 loops=1)
   Filter: ((salary > 52000) AND ((department_name)::text = 'HR'::text))
   Rows Removed by Filter: 3
   Buffers: shared hit=1
 Planning Time: 0.053 ms
 Execution Time: 0.027 ms
(6 行记录)
# 可以看到视图的SQL展开成了实际的SQL语句;

2. 变量重叠:指的是在生成多个可能的执行计划时,这些计划在某些方面具有相似性和重叠性
testdb=> CREATE TABLE products (
testdb(>     product_id INT PRIMARY KEY,
testdb(>     category VARCHAR(255),
testdb(>     price DECIMAL(10, 2)
testdb(> );
CREATE TABLE
testdb=> INSERT INTO products (product_id, category, price) VALUES
testdb-> (101, 'Electronics', 299.99),
testdb-> (102, 'Books', 19.95),
testdb-> (103, 'Electronics', 199.99),
testdb-> (104, 'Clothing', 39.95);
INSERT 0 4
testdb=>
testdb=> explain (analyze, buffers) SELECT category, SUM(price) AS total_sales
FROM products
WHERE price > 25
GROUP BY category;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=11.98..12.57 rows=47 width=548) (actual time=0.022..0.023 rows=2 loops=1)
   Group Key: category
   Batches: 1  Memory Usage: 24kB
   Buffers: shared hit=1
   ->  Seq Scan on products  (cost=0.00..11.75 rows=47 width=532) (actual time=0.011..0.013 rows=3 loops=1)
         Filter: (price > '25'::numeric)
         Rows Removed by Filter: 1
         Buffers: shared hit=1
 Planning Time: 0.074 ms
 Execution Time: 0.044 ms
(10 行记录)

testdb=>
testdb=> explain (analyze, buffers) SELECT category, AVG(price) AS average_price
FROM products
WHERE price > 25
GROUP BY category;
                                                 QUERY PLAN
------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=11.98..12.57 rows=47 width=548) (actual time=0.020..0.021 rows=2 loops=1)
   Group Key: category
   Batches: 1  Memory Usage: 24kB
   Buffers: shared hit=1
   ->  Seq Scan on products  (cost=0.00..11.75 rows=47 width=532) (actual time=0.010..0.011 rows=3 loops=1)
         Filter: (price > '25'::numeric)
         Rows Removed by Filter: 1
         Buffers: shared hit=1
 Planning Time: 0.068 ms
 Execution Time: 0.039 ms
(10 行记录)


3. 子查询消除:指的是优化器在处理包含子查询的查询时,尝试将子查询的逻辑结构转换成更高效的查询结构。有些情况下直接删除子查询,将其转换成更直接的查询操作。
testdb=> CREATE TABLE orders (
testdb(>     order_id INT PRIMARY KEY,
testdb(>     product_id INT,
testdb(>     quantity INT,
testdb(>     order_date DATE,
testdb(>     FOREIGN KEY (product_id) REFERENCES products(product_id)
testdb(> );
CREATE TABLE
testdb=> INSERT INTO orders (order_id, product_id, quantity, order_date) VALUES
testdb-> (1, 101, 2, '2024-01-10'),
testdb-> (2, 102, 5, '2024-01-11'),
testdb-> (3, 101, 1, '2024-01-12'),
testdb-> (4, 103, 3, '2024-01-13');
INSERT 0 4
testdb=> SELECT p.product_id, p.category, SUM(o.quantity) AS total_quantity
testdb-> FROM products p
testdb-> JOIN orders o ON p.product_id = o.product_id
testdb-> WHERE p.price > (SELECT AVG(price) FROM products WHERE category = 'Electronics')
testdb-> GROUP BY p.product_id, p.category;
 product_id |  category   | total_quantity
------------+-------------+----------------
        101 | Electronics |              3
(1 行记录)

testdb=> explain (analyze, buffers) SELECT p.product_id, p.category, SUM(o.quantity) AS total_quantity
FROM products p
JOIN orders o ON p.product_id = o.product_id
WHERE p.price > (SELECT AVG(price) FROM products WHERE category = 'Electronics')
GROUP BY p.product_id, p.category;
                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=60.66..61.13 rows=47 width=528) (actual time=0.031..0.033 rows=1 loops=1)
   Group Key: p.product_id
   Batches: 1  Memory Usage: 24kB
   Buffers: shared hit=3
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=11.75..11.76 rows=1 width=32) (actual time=0.010..0.011 rows=1 loops=1)
           Buffers: shared hit=1
           ->  Seq Scan on products  (cost=0.00..11.75 rows=1 width=16) (actual time=0.003..0.004 rows=2 loops=1)
                 Filter: ((category)::text = 'Electronics'::text)
                 Rows Removed by Filter: 2
                 Buffers: shared hit=1
   ->  Hash Join  (cost=12.34..45.79 rows=621 width=524) (actual time=0.027..0.029 rows=2 loops=1)
         Hash Cond: (o.product_id = p.product_id)
         Buffers: shared hit=3
         ->  Seq Scan on orders o  (cost=0.00..28.50 rows=1850 width=8) (actual time=0.004..0.005 rows=4 loops=1)
               Buffers: shared hit=1
         ->  Hash  (cost=11.75..11.75 rows=47 width=520) (actual time=0.017..0.018 rows=1 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               Buffers: shared hit=2
               ->  Seq Scan on products p  (cost=0.00..11.75 rows=47 width=520) (actual time=0.015..0.016 rows=1 loops=1)
                     Filter: (price > $0)
                     Rows Removed by Filter: 3
                     Buffers: shared hit=2
 Planning:
   Buffers: shared hit=2
 Planning Time: 0.685 ms
 Execution Time: 0.070 ms
(27 行记录)
# 可以看到原始的子查询被转换成一个单独的计划(InitPlan 1),这个计划在主查询开始之前执行;
# 通过这种方式,子查询的结果计算1次,然后在整个主查询中被重用,避免对product表的重复扫描,提高了效率。

03 提升子链

提升子链是一种数据库查询优化技术,特别是在处理相关子查询的时候。相关子查询指的是查询中引用了外部查询的表或列。

1. exists子句递归提升:
testdb=> CREATE TABLE customers (
testdb(>     customer_id SERIAL PRIMARY KEY,
testdb(>     name VARCHAR(100),
testdb(>     email VARCHAR(100),
testdb(>     registration_date DATE
testdb(> );
CREATE TABLE
testdb=> CREATE TABLE orders (
testdb(>     order_id SERIAL PRIMARY KEY,
testdb(>     customer_id INT REFERENCES customers(customer_id),
testdb(>     order_date DATE,
testdb(>     amount DECIMAL(10, 2)
testdb(> );
CREATE TABLE
testdb=> INSERT INTO customers (name, email, registration_date)
testdb-> SELECT
testdb->     CASE WHEN s % 20 = 0 THEN 'John Doe' ELSE 'Customer ' || s::TEXT END,
testdb->     'customer' || s::TEXT || '@example.com',
testdb->     CASE
testdb->         WHEN s % 100 > 0 THEN '2024-01-01'::DATE + (s % 100)::INT * INTERVAL '1 day'
testdb->         ELSE '2024-01-01'::DATE
testdb->     END
testdb-> FROM generate_series(1, 200) s;
INSERT 0 200
testdb=> INSERT INTO orders (customer_id, order_date, amount)
testdb-> SELECT
testdb->     customers.customer_id,
testdb->     '2024-01-01'::DATE + (customers.customer_id % 90) * interval '1 day',
testdb->     ROUND((customers.customer_id * 1.23)::DECIMAL, 2)
testdb-> FROM customers
testdb-> ORDER BY customers.customer_id
testdb-> LIMIT 1000; -- 限制结果为1000个订单
INSERT 0 200

testdb=> explain (analyze, buffers) SELECT c.name, o.order_date
FROM customers c
LEFT JOIN orders o ON c.customer_id = o.customer_id
WHERE EXISTS (
    SELECT 1
    FROM orders sub_o
    WHERE sub_o.customer_id = c.customer_id AND sub_o.amount > 100
);
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=12.49..18.87 rows=119 width=16) (actual time=0.102..0.137 rows=119 loops=1)
   Hash Cond: (c.customer_id = sub_o.customer_id)
   Buffers: shared hit=6
   ->  Hash Right Join  (cost=6.50..11.04 rows=200 width=20) (actual time=0.037..0.074 rows=200 loops=1)
         Hash Cond: (o.customer_id = c.customer_id)
         Buffers: shared hit=4
         ->  Seq Scan on orders o  (cost=0.00..4.00 rows=200 width=8) (actual time=0.002..0.010 rows=200 loops=1)
               Buffers: shared hit=2
         ->  Hash  (cost=4.00..4.00 rows=200 width=16) (actual time=0.032..0.032 rows=200 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 18kB
               Buffers: shared hit=2
               ->  Seq Scan on customers c  (cost=0.00..4.00 rows=200 width=16) (actual time=0.002..0.016 rows=200 loops=1)
                     Buffers: shared hit=2
   ->  Hash  (cost=4.50..4.50 rows=119 width=4) (actual time=0.036..0.036 rows=119 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 13kB
         Buffers: shared hit=2
         ->  Seq Scan on orders sub_o  (cost=0.00..4.50 rows=119 width=4) (actual time=0.012..0.026 rows=119 loops=1)
               Filter: (amount > '100'::numeric)
               Rows Removed by Filter: 81
               Buffers: shared hit=2
 Planning:
   Buffers: shared hit=8
 Planning Time: 0.196 ms
 Execution Time: 0.158 ms
(24 行记录)
# 从执行计划上来看,上述案例中优化器没有自动将相关子查询转换成连接操作,而是使用了hash semi join和hash right join结合seq scan方式进行查询。
# 如果要手动优化这个查询,可以考虑重写,可以直接在join条件中应用过滤逻辑,提升查询性能。

testdb=> EXPLAIN (ANALYZE, BUFFERS)
testdb-> SELECT c.name, o.order_date
testdb-> FROM customers c
testdb-> JOIN orders o ON c.customer_id = o.customer_id
testdb-> WHERE o.amount > 100;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=6.50..11.32 rows=119 width=16) (actual time=0.049..0.077 rows=119 loops=1)
   Hash Cond: (o.customer_id = c.customer_id)
   Buffers: shared hit=4
   ->  Seq Scan on orders o  (cost=0.00..4.50 rows=119 width=8) (actual time=0.012..0.026 rows=119 loops=1)
         Filter: (amount > '100'::numeric)
         Rows Removed by Filter: 81
         Buffers: shared hit=2
   ->  Hash  (cost=4.00..4.00 rows=200 width=16) (actual time=0.032..0.032 rows=200 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 18kB
         Buffers: shared hit=2
         ->  Seq Scan on customers c  (cost=0.00..4.00 rows=200 width=16) (actual time=0.002..0.015 rows=200 loops=1)
               Buffers: shared hit=2
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.221 ms
 Execution Time: 0.095 ms
(16 行记录)
# 可以看到执行时间大大缩小,查询性能得到提升。


2. in子句递归提升
注意:IN 子句和 EXISTS 子句在功能上是等价的。
      当子查询返回多个值时优化器可能不会将 IN 子句自动转换为连接操作,特别是当子查询的结果集较大或没有适当的索引来支持这种转换时。手动重写查询为连接操作通常可以提供更好的性能和更清晰的查询逻辑。
testdb=> EXPLAIN (ANALYZE, BUFFERS)
SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE c.customer_id IN (
    SELECT sub_o.customer_id
    FROM orders sub_o
    WHERE sub_o.amount > 100
);
                                                         QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=12.49..18.87 rows=119 width=16) (actual time=0.089..0.122 rows=119 loops=1)
   Hash Cond: (c.customer_id = sub_o.customer_id)
   Buffers: shared hit=6
   ->  Hash Join  (cost=6.50..11.04 rows=200 width=24) (actual time=0.035..0.069 rows=200 loops=1)
         Hash Cond: (o.customer_id = c.customer_id)
         Buffers: shared hit=4
         ->  Seq Scan on orders o  (cost=0.00..4.00 rows=200 width=8) (actual time=0.002..0.010 rows=200 loops=1)
               Buffers: shared hit=2
         ->  Hash  (cost=4.00..4.00 rows=200 width=16) (actual time=0.030..0.030 rows=200 loops=1)
               Buckets: 1024  Batches: 1  Memory Usage: 18kB
               Buffers: shared hit=2
               ->  Seq Scan on customers c  (cost=0.00..4.00 rows=200 width=16) (actual time=0.002..0.015 rows=200 loops=1)
                     Buffers: shared hit=2
   ->  Hash  (cost=4.50..4.50 rows=119 width=4) (actual time=0.034..0.034 rows=119 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 13kB
         Buffers: shared hit=2
         ->  Seq Scan on orders sub_o  (cost=0.00..4.50 rows=119 width=4) (actual time=0.011..0.024 rows=119 loops=1)
               Filter: (amount > '100'::numeric)
               Rows Removed by Filter: 81
               Buffers: shared hit=2
 Planning:
   Buffers: shared hit=8
 Planning Time: 0.207 ms
 Execution Time: 0.141 ms
(24 行记录)

testdb=> explain (analyze, buffers) SELECT c.name, o.order_date
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
AND o.amount > 100;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=6.50..11.32 rows=119 width=16) (actual time=0.043..0.068 rows=119 loops=1)
   Hash Cond: (o.customer_id = c.customer_id)
   Buffers: shared hit=4
   ->  Seq Scan on orders o  (cost=0.00..4.50 rows=119 width=8) (actual time=0.010..0.023 rows=119 loops=1)
         Filter: (amount > '100'::numeric)
         Rows Removed by Filter: 81
         Buffers: shared hit=2
   ->  Hash  (cost=4.00..4.00 rows=200 width=16) (actual time=0.029..0.029 rows=200 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 18kB
         Buffers: shared hit=2
         ->  Seq Scan on customers c  (cost=0.00..4.00 rows=200 width=16) (actual time=0.002..0.014 rows=200 loops=1)
               Buffers: shared hit=2
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.130 ms
 Execution Time: 0.083 ms
(16 行记录)

04 提升子查询

  1. 提升子查询是一种数据库查询优化技术,它指的是将子查询中的内容提升到外部查询中,以此来减少子查询的执行次数和提高查询效率。这种优化主要针对相关子查询,即子查询中引用了外部查询的表或列。
testdb=> explain (analyze, buffers) SELECT c.name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id AND o.amount > 100
);
                                                    QUERY PLAN
------------------------------------------------------------------------------------------------------------------
 Hash Semi Join  (cost=5.99..11.84 rows=119 width=12) (actual time=0.049..0.070 rows=119 loops=1)
   Hash Cond: (c.customer_id = o.customer_id)
   Buffers: shared hit=4
   ->  Seq Scan on customers c  (cost=0.00..4.00 rows=200 width=16) (actual time=0.004..0.014 rows=200 loops=1)
         Buffers: shared hit=2
   ->  Hash  (cost=4.50..4.50 rows=119 width=4) (actual time=0.033..0.033 rows=119 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 13kB
         Buffers: shared hit=2
         ->  Seq Scan on orders o  (cost=0.00..4.50 rows=119 width=4) (actual time=0.009..0.022 rows=119 loops=1)
               Filter: (amount > '100'::numeric)
               Rows Removed by Filter: 81
               Buffers: shared hit=2
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.149 ms
 Execution Time: 0.085 ms
(16 行记录)
# 从执行计划上看,也没有明显的子查询提升发生,查询计划显示使用l hash semi join而不是直接将exists子句中的内容转换成更简单的连接操作。

testdb=> EXPLAIN (ANALYZE, BUFFERS)
SELECT c.name
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.amount > 100;
                                                      QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=6.50..11.32 rows=119 width=12) (actual time=0.043..0.072 rows=119 loops=1)
   Hash Cond: (o.customer_id = c.customer_id)
   Buffers: shared hit=4
   ->  Seq Scan on orders o  (cost=0.00..4.50 rows=119 width=4) (actual time=0.011..0.025 rows=119 loops=1)
         Filter: (amount > '100'::numeric)
         Rows Removed by Filter: 81
         Buffers: shared hit=2
   ->  Hash  (cost=4.00..4.00 rows=200 width=16) (actual time=0.029..0.029 rows=200 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 18kB
         Buffers: shared hit=2
         ->  Seq Scan on customers c  (cost=0.00..4.00 rows=200 width=16) (actual time=0.002..0.014 rows=200 loops=1)
               Buffers: shared hit=2
 Planning:
   Buffers: shared hit=4
 Planning Time: 0.162 ms
 Execution Time: 0.089 ms
(16 行记录)

05 化繁从简

常量传递:

a1 = a2 and a2 = 100 简化成:
a1 = 100 and a2 = 100

表达式计算:

a1 = 1 + 2 简化成:
a1 = 3

去除多余括号:

(a and b) and (c and d) 简化为:
a and b and c and d

or简化:

false or a > 1 简化为:
a > 1

06 外连接消除

left join:返回包含左表中的所有记录和右表中连接字段相等的记录,如果右表没有匹配的记录,那么右表将会以NULL值代替。

right join:返回右表中的所有记录,以及左表中与右表匹配的记录。如果左表中没有与右表某行匹配的记录,那么结果集中左表的部分将为 NULL。

full join:结果是两个表的行的并集,包括所有匹配和不匹配的行。


# 这里以左连接为示例
testdb=> create table t1 (ida int, cola varchar(10));
CREATE TABLE
testdb=> create table t2 (idb int, colb varchar(10));
CREATE TABLE

testdb=> insert into t1 values (1, 't11');
INSERT 0 1
testdb=> insert into t1 values (2, 't12');
INSERT 0 1
testdb=> insert into t2 values (1, 't21');
INSERT 0 1
testdb=> insert into t2 values (2, NULL);
INSERT 0 1

testdb=> select * from t1 left join t2 on t1.ida = t2.idb;
 ida | cola | idb | colb
-----+------+-----+------
   1 | t11  |   1 | t21
   2 | t12  |   2 |
(2 行记录)

# 查看执行计划
testdb=> explain (analyze, buffers) select * from t1 left join t2 on t1.ida = t2.idb;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Merge Left Join  (cost=161.29..268.01 rows=6728 width=84) (actual time=0.021..0.023 rows=2 loops=1)
   Merge Cond: (t1.ida = t2.idb)
   Buffers: shared hit=2
   ->  Sort  (cost=80.64..83.54 rows=1160 width=42) (actual time=0.013..0.014 rows=2 loops=1)
         Sort Key: t1.ida
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1
         ->  Seq Scan on t1  (cost=0.00..21.60 rows=1160 width=42) (actual time=0.009..0.010 rows=2 loops=1)
               Buffers: shared hit=1
   ->  Sort  (cost=80.64..83.54 rows=1160 width=42) (actual time=0.003..0.003 rows=2 loops=1)
         Sort Key: t2.idb
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1
         ->  Seq Scan on t2  (cost=0.00..21.60 rows=1160 width=42) (actual time=0.002..0.002 rows=2 loops=1)
               Buffers: shared hit=1
 Planning Time: 0.072 ms
 Execution Time: 0.057 ms
(17 行记录)

# 消除外连接时,需要where和join条件保证右表不会有NULL值的行产生!
testdb=> explain (analyze, buffers) select * from t1 left join t2 on t1.ida = t2.idb where t2.idb is not null;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=160.94..267.10 rows=6693 width=84) (actual time=0.022..0.024 rows=2 loops=1)
   Merge Cond: (t2.idb = t1.ida)
   Buffers: shared hit=2
   ->  Sort  (cost=80.29..83.18 rows=1154 width=42) (actual time=0.017..0.017 rows=2 loops=1)
         Sort Key: t2.idb
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1
         ->  Seq Scan on t2  (cost=0.00..21.60 rows=1154 width=42) (actual time=0.011..0.012 rows=2 loops=1)
               Filter: (idb IS NOT NULL)
               Buffers: shared hit=1
   ->  Sort  (cost=80.64..83.54 rows=1160 width=42) (actual time=0.003..0.003 rows=2 loops=1)
         Sort Key: t1.ida
         Sort Method: quicksort  Memory: 25kB
         Buffers: shared hit=1
         ->  Seq Scan on t1  (cost=0.00..21.60 rows=1160 width=42) (actual time=0.002..0.002 rows=2 loops=1)
               Buffers: shared hit=1
 Planning Time: 0.073 ms
 Execution Time: 0.043 ms
(18 行记录)

06 条件下推

条件下推指的是将过滤条件(或谓词)应用到查询计划的更早阶段,通常是在表的扫描阶段。这样做的目的是尽早过滤掉不满足条件的行,从而减少需要处理的数据量,提高查询效率。

testdb=> CREATE TABLE orders (
testdb(>     order_id SERIAL PRIMARY KEY,
testdb(>     customer_id INT,
testdb(>     order_date DATE,
testdb(>     amount NUMERIC
testdb(> );
CREATE TABLE

testdb=> DO $$
testdb$> DECLARE
testdb$>     r RECORD;
testdb$>     i INT := 0;
testdb$> BEGIN
testdb$>     WHILE i < 1000000 LOOP
testdb$>         i := i + 1;
testdb$>         INSERT INTO orders (customer_id, order_date, amount)
testdb$>         SELECT
testdb$>             -- 这里使用了一些简单的逻辑来生成数据
testdb$>             -- 您可能需要根据实际情况调整生成逻辑
testdb$>             i % 1000, -- 假设有1000个客户
testdb$>             '2024-01-01'::date + i, -- 从2024-01-01开始的日期
testdb$>             (i % 100) * 10 + 50; -- 随机生成50到150之间的金额
testdb$>     END LOOP;
testdb$> END $$;
DO

testdb=> CREATE INDEX idx_orders_order_date ON orders (order_date);
CREATE INDEX

testdb=> analyze orders;
ANALYZE

testdb=> explain (analyze, buffers) SELECT *
FROM orders
WHERE order_date < '2024-02-01'
  AND amount < 100;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Index Scan using idx_orders_order_date on orders  (cost=0.42..9.08 rows=2 width=17) (actual time=0.003..0.007 rows=4 loops=1)
   Index Cond: (order_date < '2024-02-01'::date)
   Filter: (amount < '100'::numeric)
   Rows Removed by Filter: 26
   Buffers: shared hit=4
 Planning:
   Buffers: shared hit=20
 Planning Time: 0.126 ms
 Execution Time: 0.015 ms
(9 行记录)


# 结论: 
1. 优化器已经将 order_date 的过滤条件下推到索引扫描中,这意味着索引扫描只会返回order_date小于2024-02-01的行。
2. filter是索引扫描之后应用的,因为它不是索引的一部分,所以不能在索引扫描阶段被下推。
# 在PostgreSQL中谓词下推经常自动发生,优化器会根据索引和过滤条件的类型来决定是否可以将条件下推到索引索引或表扫描中。

07 语义优化

当表中字段存在约束键时,PostgreSQL会将对其进行语义优化,因为查询条件中有可能已经隐含满足或者不满足。

testdb=> set constraint_exclusion = on;
SET
testdb=> create table t3 (id int not null);
CREATE TABLE
testdb=> explain select * from t3 where id is null;
                QUERY PLAN
------------------------------------------
 Result  (cost=0.00..0.00 rows=0 width=0)
   One-Time Filter: false
(2 行记录)

# 表t3的id字段已经隐含不为null,所以id=null这种条件可以直接返回false,PostgreSQL数据库默认并没有开启约束优化,所以需要设置constraint_exclusion这个参数。

08 MAX/MIN优化

针对已经创建索引的字段,通过访问索引最大值和最小值都可以直接获取到。因此PostgreSQL数据库针对max/min函数做了进一步的转化。

testdb=> create table t4 (id int primary key, col1 varchar(10));
CREATE TABLE
testdb=> insert into t4 values (generate_series(1,1000),'aaaaa'||generate_series(1,1000));
INSERT 0 1000
testdb=> explain (analyze, buffers) select min(id) from t4;
                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=0.32..0.33 rows=1 width=4) (actual time=0.016..0.016 rows=1 loops=1)
   Buffers: shared hit=3
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.28..0.32 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=1)
           Buffers: shared hit=3
           ->  Index Only Scan using t4_pkey on t4  (cost=0.28..46.77 rows=1000 width=4) (actual time=0.012..0.012 rows=1 loops=1)
                 Index Cond: (id IS NOT NULL)
                 Heap Fetches: 1
                 Buffers: shared hit=3
 Planning:
   Buffers: shared hit=14
 Planning Time: 0.136 ms
 Execution Time: 0.030 ms
(13 行记录)

# 观察到 select min(id) from t4 被转化了,相当于:
# select id from t4 order by id limit 1 这种写法。
testdb=> explain (analyze, buffers) select id from t4 order by id limit 1;
                                                        QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..0.32 rows=1 width=4) (actual time=0.021..0.021 rows=1 loops=1)
   Buffers: shared hit=3
   ->  Index Only Scan using t4_pkey on t4  (cost=0.28..44.27 rows=1000 width=4) (actual time=0.019..0.020 rows=1 loops=1)
         Heap Fetches: 1
         Buffers: shared hit=3
 Planning Time: 0.068 ms
 Execution Time: 0.034 ms
(7 行记录)

09 group by优化

GROUP BY 是 SQL 中的一个子句,它允许您根据一个或多个列的值对结果集进行分组,并在每个组上应用聚合函数来计算汇总数据。如果不对group by优化,那么将会需要对结果进行sort或者hash,但是如果表中数据原本就是排序好的,那么将可以对齐进行优化。

testdb=> create table t5(id int primary key, name varchar(255));
CREATE TABLE

testdb=> insert into t5 select generate_series(1,10000),md5(random()::varchar);
INSERT 0 10000


testdb=> explain (analyze, buffers)select id from t5 group by id;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Group  (cost=0.29..295.29 rows=10000 width=4) (actual time=0.035..1.472 rows=10000 loops=1)
   Group Key: id
   Buffers: shared hit=30
   ->  Index Only Scan using t5_pkey on t5  (cost=0.29..270.29 rows=10000 width=4) (actual time=0.034..0.796 rows=10000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=30
 Planning Time: 0.090 ms
 Execution Time: 2.302 ms
(8 行记录)


testdb=> explain (analyze, buffers)select name from t5 group by name;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=209.00..309.00 rows=10000 width=33) (actual time=1.993..3.003 rows=10000 loops=1)
   Group Key: name
   Batches: 1  Memory Usage: 1169kB
   Buffers: shared hit=84
   ->  Seq Scan on t5  (cost=0.00..184.00 rows=10000 width=33) (actual time=0.009..0.507 rows=10000 loops=1)
         Buffers: shared hit=84
 Planning Time: 0.048 ms
 Execution Time: 3.728 ms

10 order by优化

使用索引消除order by

testdb=> explain select id from t5 group by id;
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Group  (cost=0.29..295.29 rows=10000 width=4)
   Group Key: id
   ->  Index Only Scan using t5_pkey on t5  (cost=0.29..270.29 rows=10000 width=4)
(3 行记录)

order by下推

testdb=> EXPLAIN (ANALYZE, BUFFERS)
SELECT value
FROM sample_data
ORDER BY value ASC
LIMIT 10;
                                                                      QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..0.77 rows=10 width=4) (actual time=0.021..0.023 rows=10 loops=1)
   Buffers: shared hit=3
   ->  Index Only Scan using idx_sample_data_value on sample_data  (cost=0.29..4768.29 rows=100000 width=4) (actual time=0.019..0.021 rows=10 loops=1)
         Heap Fetches: 10
         Buffers: shared hit=3
 Planning Time: 0.065 ms
 Execution Time: 0.035 ms
(7 行记录)

11 distinct优化

distinct类似于group by优化,distinct将会从sort和hash中选择最优的,如果字段中有索引,

# 有索引情况下:
testdb=> explain (analyze, buffers)select id from t5 group by id;
                                                           QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
 Group  (cost=0.29..295.29 rows=10000 width=4) (actual time=0.076..1.682 rows=10000 loops=1)
   Group Key: id
   Buffers: shared hit=30
   ->  Index Only Scan using t5_pkey on t5  (cost=0.29..270.29 rows=10000 width=4) (actual time=0.075..0.707 rows=10000 loops=1)
         Heap Fetches: 0
         Buffers: shared hit=30
 Planning Time: 0.053 ms
 Execution Time: 1.940 ms
(8 行记录)


# 无索引情况下:
testdb=> explain (analyze, buffers)select name from t5 group by name;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=209.00..309.00 rows=10000 width=33) (actual time=2.191..2.835 rows=10000 loops=1)
   Group Key: name
   Batches: 1  Memory Usage: 1169kB
   Buffers: shared hit=84
   ->  Seq Scan on t5  (cost=0.00..184.00 rows=10000 width=33) (actual time=0.010..0.583 rows=10000 loops=1)
         Buffers: shared hit=84
 Planning Time: 0.079 ms
 Execution Time: 3.306 ms
(8 行记录)

12 集合操作优化

集合操作优化指的是在数据库查询中对涉及集合运算(如UNION、INTERSECT、EXCEPT)的查询进行性能提升的方法。这些操作用于合并或比较两个或多个查询结果集。集合操作优化的目标是减少计算量、提高查询效率并最小化资源消耗。

testdb=> CREATE TABLE collection_demo (
testdb(>     id SERIAL PRIMARY KEY,
testdb(>     group_id INTEGER NOT NULL,
testdb(>     value INTEGER NOT NULL
testdb(> );
CREATE TABLE

testdb=> INSERT INTO collection_demo (group_id, value)
testdb-> SELECT (i % 10) + 1, (i % 100)
testdb-> FROM generate_series(1, 1000000) s(i);
INSERT 0 1000000

testdb=> CREATE INDEX idx_collection_demo_group_id ON collection_demo (group_id);
CREATE INDEX

testdb=> EXPLAIN (ANALYZE, BUFFERS)
testdb-> SELECT group_id, value
testdb-> FROM collection_demo
testdb-> WHERE group_id = 1
testdb-> UNION ALL
testdb-> SELECT group_id, value
testdb-> FROM collection_demo
testdb-> WHERE group_id = 2
testdb-> LIMIT 100;
                                                                          QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=1133.24..1140.94 rows=100 width=8) (actual time=4.494..4.520 rows=100 loops=1)
   Buffers: shared hit=6 read=87
   ->  Append  (cost=1133.24..16566.00 rows=200600 width=8) (actual time=4.493..4.514 rows=100 loops=1)
         Buffers: shared hit=6 read=87
         ->  Bitmap Heap Scan on collection_demo  (cost=1133.24..7805.08 rows=101267 width=8) (actual time=4.493..4.507 rows=100 loops=1)
               Recheck Cond: (group_id = 1)
               Heap Blocks: exact=6
               Buffers: shared hit=6 read=87
               ->  Bitmap Index Scan on idx_collection_demo_group_id  (cost=0.00..1107.93 rows=101267 width=0) (actual time=3.983..3.983 rows=100000 loops=1)
                     Index Cond: (group_id = 1)
                     Buffers: shared read=87
         ->  Bitmap Heap Scan on collection_demo collection_demo_1  (cost=1110.26..7757.92 rows=99333 width=8) (never executed)
               Recheck Cond: (group_id = 2)
               ->  Bitmap Index Scan on idx_collection_demo_group_id  (cost=0.00..1085.42 rows=99333 width=0) (never executed)
                     Index Cond: (group_id = 2)
 Planning:
   Buffers: shared hit=29
 Planning Time: 0.164 ms
 Execution Time: 4.553 ms
(19 行记录)
# 结论:集合操作union换成append方式。

总结:文本介绍了几种常见的PostgreSQL优化器对SQL优化的方法,这些方法更着重于SQL逻辑优化,也就是尽量对SQL进行等价或者推倒变换,以达到更有效率的执行计划。

posted @ 2025-04-07 20:10  呆呆的私房菜  阅读(49)  评论(1)    收藏  举报