PG优化实战系列-为什么走索引比顺序扫描慢?
在某现场进行优化过程中,开发拿来一个有意思的现象,一个SQL不加order by速度快数倍。查看两者执行计划发现不加order by的SQL走的全表扫描,而加了order by的SQL走的索引,速度反而比前者慢很多。
不带ORDER BY SQL:
explain (ANALYZE true,buffers true) SELECT 字段1,字段2,字段3.... FROM T1 WHERE 1=1 AND point (gis_x, gis_y) <@ polygon(path ('(13.65097544735745,213.082022839820084),(13.65119929096986,223.082051117844507)')) LIMIT 3 OFFSET 0; 对应执行计划: Limit (cost=0.00..284.56 rows=3 width=1316) (actual time=218.323..643.163 rows=3 loops=1) Buffers: shared hit=427577 -> Gather (cost=0.00..840958.94 rows=8866 width=1316) (actual time=218.322..643.158 rows=3 loops=1) Workers Planned: 8 Workers Launched: 8 Buffers: shared hit=427577 -> Parallel Seq Scan on T1 (cost=0.00..840958.94 rows=1108 width=1316) (actual time=244.172..436.927 rows=4 loops=9) Filter: (point(gis_x, gis_y) <@ '((13.65097544735745,213.082022839820084),(13.65119929096986,223.082051117844507))'::polygon) Rows Removed by Filter: 263023 Buffers: shared hit=427577 Planning time: 0.119 ms Planning time: 0.119 ms Execution time: 643.199 ms
带ORDER BY SQL:
explain (ANALYZE true,buffers true) SELECT 字段1,字段2,字段3.... FROM T1 WHERE 1=1 AND point (gis_x, gis_y) <@ polygon(path ('(13.65097544735745,213.082022839820084),(13.65119929096986,223.082051117844507)')) ORDER BY jjrqsj desc LIMIT 3 OFFSET 0; 对应执行计划: Limit (cost=0.58..1115.49 rows=3 width=1316) (actual time=1588.362..3011.507 rows=3 loops=1) Buffers: shared hit=4589932 read=1571 -> Gather Merge (cost=0.58..3294942.79 rows=8866 width=1316) (actual time=1588.360..3011.502 rows=3 loops=1) Workers Planned: 8 Workers Launched: 8 Buffers: shared hit=4589932 read=1571 -> Parallel Index Scan Backward using T1_jjrqsj on T1 (cost=0.43..3294779.96 rows=1108 width=1316) (actual time=538.801..2366.443 rows=4 loops=9) Filter: (point(gis_x, gis_y) <@ '((13.65097544735745,213.082022839820084),(13.65119929096986,223.082051117844507))'::polygon) Rows Removed by Filter: 514932 Buffers: shared hit=4589932 read=1571 Planning time: 0.137 ms Execution time: 3011.562 ms
从上面执行计划可以看出,加了order by的SQL走了索引,但是执行时间相比全表扫描反而增加了5倍。
问题分析:
从上面能看到两个SQL均走了并行;我们知道开启并行需要满足条件,且存在额外代价;
查看数据库当前代价值:
show parallel_setup_cost; --0 默认1000
show parallel_tuple_cost; --0 默认0.1
并行代价为0,故而执行计划优化器认为并行能提高性能。
添加代价:
set parallel_setup_cost =1000;
set parallel_tuple_cost =0.1;
(全局生效需要修改postgresql.conf 然后pg_ctl reload)
效果如下:
explain (ANALYZE true,buffers true) SELECT 字段1,字段2,字段3.... FROM T1 WHERE 1=1 AND point (gis_x, gis_y) <@ polygon(path ('(13.65097544735745,213.082022839820084),(13.65119929096986,223.082051117844507)')) ORDER BY jjrqsj desc LIMIT 3 OFFSET 0; 对应执行计划: Limit (cost=0.43..1154.67 rows=3 width=1316) (actual time=366.062..484.891 rows=3 loops=1) Buffers: shared hit=315899 -> Index Scan Backward using T1_jjrqsj on T1 (cost=0.43..3411140.55 rows=8866 width=1316) (actual time=366.060..484.886 rows=3 loops=1) Filter: (point(gis_x, gis_y) <@ '((13.65097544735745,213.082022839820084),(13.65119929096986,223.082051117844507))'::polygon) Rows Removed by Filter: 308128 Buffers: shared hit=315899 Planning time: 0.123 ms Execution time: 484.921 ms
可以看到执行计划没采用并行,一个进程就能处理得相对很快,但是看执行计划通过 索引“T1_jjrqsj”过滤出来的数据又被“Filter”过滤;
这一块给业务的建议是使用post_gis插件,利用GIST索引进行优化。
最终解决方案:
1、参数调整回默认
2、使用post_gis插件
问题总结:
并行虽然能够利用多个进程处理,但是并行也是需要代价的,从加order by的两个执行计划可以看到两者的cost代价为“1115”几乎一致,这肯定是有问题的;
至于为什么未加order by的并行顺序扫描速度较加order by的并行索引扫描要快很多,原因可能是:走索引的逻辑是先在索引上找到对应的行所在数据块位置,然后进行回表走的是随机读;而顺序扫描可以理解为顺序读;STAT盘上随机读与顺序读速度差别很大,在总体数据量不是特别大的情况下,走并行的索引扫描代价比执行计划显示的要高很多。
此类数据库参数调整需要谨慎,至少要知道调整此类参数会对数据库有何影响。

浙公网安备 33010602011771号