Oracle Execute Plan原理分析与实例分享之一
特此说明:本文引自http://f.dataguru.cn/forum.php?mod=viewthread&tid=38865&fromuid=303
"执行计划的执行顺序":
先从开头一直往右看,一直看到最右边有并列代码部分。如果遇到并列的,就从上往下看。对于并列的步骤,靠上的先执行;对于不并列的步骤,靠右的先执行。
引言:oracle执行计划对我们并不陌生,往往我们在工作中只会在发生性能瓶颈时看一下,没有想过执行计划是如何生成的。下面用实例来模拟数据访问方式和数据处理方式的演变。
1.执行计划--通过表访问数据 TABLE ACCESS FULL
1 SQL> conn jack_lin/jack 2 Connected. 3 SQL> create table leo1 as select * from dba_objects; --我们创建一张表leo1 4 5 Table created. 6 7 SQL> select count(*) from leo1; --这张表有92254条记录 8 9 COUNT(*) 10 ---------- 11 72254 12 13 SQL> set autotrace trace exp; --启动执行计划 14 SQL> select * from leo1; 15 16 Execution Plan 17 ---------------------------------------------------------- 18 Plan hash value: 2716644435 19 20 -------------------------------------------------------------------------- 21 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 22 -------------------------------------------------------------------------- 23 | 0 | SELECT STATEMENT | | 77166 | 15M| 288 (1)| 00:00:04 | 24 | 1 | TABLE ACCESS FULL| LEO1 | 77166 | 15M| 288 (1)| 00:00:04 | 25 -------------------------------------------------------------------------- 26 27 Note 28 ----- 29 - dynamic sampling used for this statement (level=2)
数据访问方式:走的是全表扫描,因为没有创建索引,所以没办法走索引,这是一种效率不高的数据访问方式,在实际应用中比较少。
2.执行计划--通过表并行访问数据 PARALLEL
1 SQL> select /*+ parallel */ count(*) from leo1; --自动评估并行度 2 3 Execution Plan 4 ---------------------------------------------------------- 5 Plan hash value: 452265093 6 7 -------------------------------------------------------------------------------------------------------- 8 | Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | 9 -------------------------------------------------------------------------------------------------------- 10 | 0 | SELECT STATEMENT | | 1 | 160 (1)| 00:00:02 | | | | 11 | 1 | SORT AGGREGATE | | 1 | | | | | | 12 | 2 | PX COORDINATOR | | | | | | | | 13 | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) | 14 | 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | | 15 | 5 | PX BLOCK ITERATOR | | 77166 | 160 (1)| 00:00:02 | Q1,00 | PCWC | | 16 | 6 | TABLE ACCESS FULL| LEO1 | 77166 | 160 (1)| 00:00:02 | Q1,00 | PCWP | | 17 -------------------------------------------------------------------------------------------------------- 18 19 Note 20 ----- 21 - dynamic sampling used for this statement (level=2) 22 - automatic DOP: Computed Degree of Parallelism is 2
如果不指定并行度,优化器自动评估并行度为2。
1 SQL> select /*+ parallel(leo1 4) */ count(*) from leo1; 2 3 Execution Plan 4 ---------------------------------------------------------- 5 Plan hash value: 452265093 6 7 -------------------------------------------------------------------------------------------------------- 8 | Id | Operation | Name | Rows | Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib | 9 -------------------------------------------------------------------------------------------------------- 10 | 0 | SELECT STATEMENT | | 1 | 80 (0)| 00:00:01 | | | | 11 | 1 | SORT AGGREGATE | | 1 | | | | | | 12 | 2 | PX COORDINATOR | | | | | | | | 13 | 3 | PX SEND QC (RANDOM) | :TQ10000 | 1 | | | Q1,00 | P->S | QC (RAND) | 14 | 4 | SORT AGGREGATE | | 1 | | | Q1,00 | PCWP | | 15 | 5 | PX BLOCK ITERATOR | | 77166 | 80 (0)| 00:00:01 | Q1,00 | PCWC | | 16 | 6 | TABLE ACCESS FULL| LEO1 | 77166 | 80 (0)| 00:00:01 | Q1,00 | PCWP | | 17 -------------------------------------------------------------------------------------------------------- 18 19 Note 20 ----- 21 - dynamic sampling used for this statement (level=2)
数据访问方式:这次的访问方式采用了并行机制,并行比非并行的效率是截然不同的,我们指定了4个并行度那么就会有4个进程来分割整个表数据,每个进程分别处理1/4数据,这样理论上提升了4倍的效率(并行度的个数和cpu数据匹配)。我们来看一下执行计划的执行顺序,首先全表扫描LEO1->并行迭代方式访问块->SORT AGGREGETE把检索出来的结果进行统计->PX SEND QC (RANDOM)串行的把4个进程的结果逐个发送到QC并行协调器->PX COORDINATOR并行协调器进行结果合并->SORT AGGREGATE再次统计结果->最后把结果返回给用户。
3.执行计划--通过索引唯一扫描访问数据 INDEX UNIQUE SCAN
1 SQL> alter table leo1 add constraint pk_leo1 primary key (object_id); --给leo1表object_id列添加主键 2 3 Table altered. 4 5 SQL> select * from leo1 where object_id=100; --查看id=100时数据访问方式 6 7 Execution Plan 8 ---------------------------------------------------------- 9 Plan hash value: 2711624550 10 11 --------------------------------------------------------------------------------------- 12 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 13 --------------------------------------------------------------------------------------- 14 | 0 | SELECT STATEMENT | | 1 | 207 | 2 (0)| 00:00:01 | 15 | 1 | TABLE ACCESS BY INDEX ROWID| LEO1 | 1 | 207 | 2 (0)| 00:00:01 | 16 |* 2 | INDEX UNIQUE SCAN | PK_LEO1 | 1 | | 1 (0)| 00:00:01 | 17 --------------------------------------------------------------------------------------- 18 19 Predicate Information (identified by operation id): 20 --------------------------------------------------- 21 22 2 - access("OBJECT_ID"=100) --谓词条件object_id=100,就是你根据什么条件生成执行计划
数据访问方式:这条sql语句大家很容易看出,首先执行INDEX UNIQUE SCAN索引唯一扫描,因为你选择的是等值范围,优化器可以直接定位你的索引块,又因为你要的是id=100这条记录的所有字段值(*),因此TABLE ACCESS BY INDEX ROWID还要通过索引键值找到对应的ROWID,再去访问ROWID所在数据块找到需要的记录。这是一种比较快速的访问方式,扫描的块少,资源占用率也小,是一种推荐使用的方式。
1 SQL> select object_id from leo1 where object_id=100; 2 3 Execution Plan 4 ---------------------------------------------------------- 5 Plan hash value: 1889847647 6 7 ----------------------------------------------------------------------------- 8 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9 ----------------------------------------------------------------------------- 10 | 0 | SELECT STATEMENT | | 1 | 13 | 1 (0)| 00:00:01 | 11 |* 1 | INDEX UNIQUE SCAN| PK_LEO1 | 1 | 13 | 1 (0)| 00:00:01 | 12 ----------------------------------------------------------------------------- 13 14 Predicate Information (identified by operation id): 15 --------------------------------------------------- 16 17 1 - access("OBJECT_ID"=100)
注:select object_id from leo1 where object_id=100;如果是执行这条sql语句,那么我们只需扫描索引键值即可得到结果,无需再去访问数据块了(因为索引块上就保存了id=100数据),这种方式又加快了检索的速度。
4.执行计划--通过索引范围扫描访问数据 INDEX RANGE SCAN
1 SQL> select * from leo1 where object_id>10 and object_id<100; 2 3 Execution Plan 4 ---------------------------------------------------------- 5 Plan hash value: 2612250437 6 7 --------------------------------------------------------------------------------------- 8 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9 --------------------------------------------------------------------------------------- 10 | 0 | SELECT STATEMENT | | 89 | 18423 | 4 (0)| 00:00:01 | 11 | 1 | TABLE ACCESS BY INDEX ROWID| LEO1 | 89 | 18423 | 4 (0)| 00:00:01 | 12 |* 2 | INDEX RANGE SCAN | PK_LEO1 | 89 | | 2 (0)| 00:00:01 | 13 --------------------------------------------------------------------------------------- 14 15 Predicate Information (identified by operation id): 16 --------------------------------------------------- 17 18 2 - access("OBJECT_ID">10 AND "OBJECT_ID"<100) --谓词条件object_id>10 and object_id<100 19 20 Note 21 ----- 22 - dynamic sampling used for this statement (level=2) 动态采样用于此语句
数据访问方式:由于where条件是object_id>10 and object_id<100一个范围(而索引块按顺序排序的,也是按顺序扫描的)因此优化器采用了INDEX RANGE SCAN索引范围扫描,把符合条件的索引块拿出来,找到索引键值对应的ROWID,再去访问ROWID所在的数据块找到需要的记录。这种方式虽然比索引唯一扫描效率低一点,但大大优于全表扫描。也是推荐的一种数据访问方法。
5.执行计划--通过快速索引全扫描访问数据 INDEX FAST FULL SCAN
原理:把索引链切割成很多区域,多索引块并行扫描,这样比INDEX FULL SCAN效率要高。
1 SQL> select count(*) from leo1; 2 3 Execution Plan 4 ---------------------------------------------------------- 5 Plan hash value: 173418543 6 7 ------------------------------------------------------------------------- 8 | Id | Operation | Name | Rows | Cost (%CPU)| Time | 9 ------------------------------------------------------------------------- 10 | 0 | SELECT STATEMENT | | 1 | 46 (0)| 00:00:01 | 11 | 1 | SORT AGGREGATE | | 1 | | | 12 | 2 | INDEX FAST FULL SCAN| PK_LEO1 | 77166 | 46 (0)| 00:00:01 | 13 ------------------------------------------------------------------------- 14 15 Note 16 ----- 17 - dynamic sampling used for this statement (level=2)
数据访问方式:我们的目的想知道leo1表一共有多少条记录,我们又知道表上创建了索引,索引的条数和数据行时一一对应的。那么我们扫描一遍索引块要比扫描一遍表数据块是不是要快,因为扫描的数据量少,在索引块里只需扫描有多少索引键值就知道对应有多少条记录,同时又启动了并行扫描方式。SORT AGGREGATE 对检索出来的结果进行统计。
6.执行计划--通过索引全扫描访问数据 INDEX FULL SCAN
1 SQL> select object_id from leo1 order by object_id; 2 3 Execution Plan 4 ---------------------------------------------------------- 5 Plan hash value: 1595913726 6 7 ---------------------------------------------------------------------------- 8 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 9 ---------------------------------------------------------------------------- 10 | 0 | SELECT STATEMENT | | 77166 | 979K| 164 (0)| 00:00:02 | 11 | 1 | INDEX FULL SCAN | PK_LEO1 | 77166 | 979K| 164 (0)| 00:00:02 | 12 ---------------------------------------------------------------------------- 13 14 Note 15 ----- 16 - dynamic sampling used for this statement (level=2)
数据访问方式:我们要对记录进行一次排序,索引块就是按照顺序存储的,也是按照顺序扫描的。排序工作是串行化的,因此不能并行操作(也就不适应INDEX FAST FULL SCAN)所以我们把索引键值全部扫描一遍就相当于排好序了,根本用不着去访问表数据块。
7.执行计划--通过索引跳跃扫描访问数据 INDEX SKIP SCAN
解释:所谓的索引跳跃扫描,是指跳过前导字段进行扫描,例如表上有一个复合索引,而在查询中有除了索引中第一列(前导字段)的其他列作为条件,并且优化器是CBO,这时候执行计划就可能走INDEX SKIP SCAN
1 SQL> create table leo3(x number,y varchar2(30),z varchar2(30)); --创建一张表,有三个字段 2 3 Table created. 4 5 SQL> create index compound_idx_leo3 on leo3(x,y); --创建一个复合索引 6 7 Index created. 8 9 SQL> begin --插入10W条记录 10 2 for i in 1..100000 loop 11 3 insert into leo3 values(mod(i,30),to_char(i),to_char(i)); 12 4 end loop; 13 5 commit; 14 6 end; 15 7 / 16 17 PL/SQL procedure successfully completed. 18 19 SQL> analyze table leo3 compute statistics; --对表进行整理数据分析 20 21 Table analyzed. 22 23 SQL> set autotrace trace explain; 24 25 SQL> select * from leo3 where y='1000'; 26 27 Execution Plan 28 ---------------------------------------------------------- 29 Plan hash value: 1334303583 30 31 ------------------------------------------------------------------------------------------------- 32 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 33 ------------------------------------------------------------------------------------------------- 34 | 0 | SELECT STATEMENT | | 1 | 12 | 32 (0)| 00:00:01 | 35 | 1 | TABLE ACCESS BY INDEX ROWID| LEO3 | 1 | 12 | 32 (0)| 00:00:01 | 36 |* 2 | INDEX SKIP SCAN | COMPOUND_IDX_LEO3 | 1 | | 31 (0)| 00:00:01 | 37 ------------------------------------------------------------------------------------------------- 38 39 Predicate Information (identified by operation id): --谓词条件跳过前导字段(x)进行扫描生成 40 --------------------------------------------------- 41 42 2 - access("Y"='1000') 43 filter("Y"='1000')
数据访问方式:如果要想使用索引跳跃扫描需要几个前提条件:
a.跳过前导字段;
b.optimizer是CBO;
c.对表数据块进行分析,让CBO优化器了解数据的分布情况;
d.还需要保证第一列的distinct value非常小,表上要有正确的统计数据。
有了上述条件,我们在进行数据扫描时就有可能会走INDEX SKIP SCAN。
8.执行计划--数据处理方式 哈希关联 HASH JOIN
HASH JOIN特点:没有索引时HASH的效果更好,HASH需要一定的计算所以会消耗些CPU资源
1 SQL> create table leo2 as select * from dba_objects where rownum<20000; --创建leo2表 2 3 Table created. 4 5 SQL> set autotrace off --关闭执行计划 6 SQL> select count(*) from leo2; --查询表leo2中的记录 7 8 COUNT(*) 9 ---------- 10 19999 11 12 SQL> set autotrace trace exp; --启动执行计划 13 SQL> select leo1.* from leo1,leo2 where leo1.object_id=leo2.object_id; 14 15 Execution Plan 16 ---------------------------------------------------------- 17 Plan hash value: 2290691545 18 19 --------------------------------------------------------------------------- 20 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 21 --------------------------------------------------------------------------- 22 | 0 | SELECT STATEMENT | | 15648 | 3361K| 368 (1)| 00:00:05 | 23 |* 1 | HASH JOIN | | 15648 | 3361K| 368 (1)| 00:00:05 | 24 | 2 | TABLE ACCESS FULL| LEO2 | 15648 | 198K| 79 (0)| 00:00:01 | 25 | 3 | TABLE ACCESS FULL| LEO1 | 77166 | 15M| 288 (1)| 00:00:04 | 26 --------------------------------------------------------------------------- 27 28 Predicate Information (identified by operation id): 29 --------------------------------------------------- 30 31 1 - access("LEO1"."OBJECT_ID"="LEO2"."OBJECT_ID") --谓词条件2个表中object_id相等的行 32 33 Note 34 ----- 35 - dynamic sampling used for this statement (level=2)
数据处理方式:查询2个表中object_id相等的行,HASH JOIN特点先把小表build到内存中,再和大表精确匹配,select leo1.* from leo2,leo1 where leo1.object_id=leo2.object_id;不管from leo2,leo1如何排序,都会先扫描小表LEO2(记录少),在扫描大表LEO1(记录多),扫描完2个表之后,把leo2 build到内存中,再和leo1进行hash join。
9.执行计划--数据处理方式 嵌套循环关联 NESTED LOOP JOIN
NESTED LOOP JOIN特定:两张表最好有索引,通过索引键值进行匹配效率较高。
1 SQL> alter table leo2 add constraint pk_leo2 primary key (object_id); --leo2表添加主键 2 3 Table altered. 4 5 SQL> select leo1.* from leo1,leo2 where leo1.object_id=leo2.object_id; --关联匹配 6 7 Execution Plan 8 ---------------------------------------------------------- 9 Plan hash value: 1603444237 10 11 ------------------------------------------------------------------------------ 12 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | 13 ------------------------------------------------------------------------------ 14 | 0 | SELECT STATEMENT | | 15648 | 3361K| 293 (3)| 00:00:04 | 15 | 1 | NESTED LOOPS | | 15648 | 3361K| 293 (3)| 00:00:04 | 16 | 2 | TABLE ACCESS FULL| LEO1 | 77166 | 15M| 288 (1)| 00:00:04 | 17 |* 3 | INDEX UNIQUE SCAN| PK_LEO2 | 1 | 13 | 0 (0)| 00:00:01 | 18 ------------------------------------------------------------------------------ 19 20 Predicate Information (identified by operation id): 21 --------------------------------------------------- 22 23 3 - access("LEO1"."OBJECT_ID"="LEO2"."OBJECT_ID") 24 25 Note 26 ----- 27 - dynamic sampling used for this statement (level=2)
数据处理方式:从leo1表里面拿出一条记录到leo2表里进行匹配(当然是通过索引匹配),遍历整个leo2表,发现匹配的行就取出来。从leo1表里面拿出几条记录,就要遍历leo2表几次,所以2张表最好有索引才会走NESTED LOOP JOIN。
10.执行计划--数据处理方式 合并关联 MERGE JOIN
1 SQL> alter table leo1 drop constraint pk_leo1; 2 3 Table altered. 4 5 SQL> alter table leo2 drop constraint pk_leo2; 6 7 Table altered. 8 --删除leo1,leo2表上的主键 9 SQL> select l1.* from (select * from leo1 order by object_id) l1,(select * from leo2 order by object_id) l2 where l1.object_id=l2.object_id; 10 11 Execution Plan 12 ---------------------------------------------------------- 13 Plan hash value: 463394885 14 15 ------------------------------------------------------------------------------------- 16 | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | 17 ------------------------------------------------------------------------------------- 18 | 0 | SELECT STATEMENT | | 77167 | 16M| | 3916 (1)| 00:00:47 | 19 | 1 | MERGE JOIN | | 77167 | 16M| | 3916 (1)| 00:00:47 | 20 | 2 | VIEW | | 77166 | 15M| | 3762 (1)| 00:00:46 | 21 | 3 | SORT ORDER BY | | 77166 | 15M| 18M| 3762 (1)| 00:00:46 | 22 | 4 | TABLE ACCESS FULL| LEO1 | 77166 | 15M| | 288 (1)| 00:00:04 | 23 |* 5 | SORT JOIN | | 15648 | 198K| 632K| 155 (2)| 00:00:02 | 24 | 6 | TABLE ACCESS FULL | LEO2 | 15648 | 198K| | 79 (0)| 00:00:01 | 25 ------------------------------------------------------------------------------------- 26 27 Predicate Information (identified by operation id): 28 --------------------------------------------------- 29 30 5 - access("L1"."OBJECT_ID"="LEO2"."OBJECT_ID") 31 filter("L1"."OBJECT_ID"="LEO2"."OBJECT_ID") 32 33 Note 34 ----- 35 - dynamic sampling used for this statement (level=2)
数据处理方式:所谓MERGE JOIN方式,是先对leo1,leo2表整体排序,在逐条进行匹配。通常MERGE JOIN方式效率不高,因为先要有排序过程。顺序:leo1表全表扫描->SORT ORDER BY 排序->VIEW排好序的结果集->leo2表全表扫描->SORT JOIN关联排序->MERGE JOIN。