11 The Query Optimizer (over view) 精细
/*
11 The Query Optimizer
Overview of the Query Optimizer
Overview of Optimizer Access Paths
Overview of Joins
Reading and Understanding Execution Plans
Controlling Optimizer Behavior
*/
11.1 Overview of the Query Optimizer
11.1.2 Components of the Query Optimizer
SQL在执行时, 发生了以下一些替换:
View Merging : 视图替换成基本表的join
Subquery Unnesting : 会将子查询, 替换为表连接
Query Rewrite with Materialized Views : 可以被物化视图取代情况, 会被物化视图取代
11.1.3 Bind Variable Peeking
Adaptive Cursor Sharing : 我们这里要说明的Adaptive Cursor Sharing 特性,其允许一个使用绑定变量的SQL语句使用多个执行计划对于同一个SQL, 为了得到合适的查询,oracle 会监控使用不同绑定变量的情况,已确保对不同绑定变量值的cursor(执行计划)都是最优的。比如因为数据倾斜的原因对绑定变量值A 使用执行计划A,对绑定变量值B 使用执行计划B。 虽然他们的SQL 是相同的,但执行计划不同
当有绑定变量时, 11G版本会每次进行绑定变量窥探, 并根据窥探值, 看是否在之前的SQLtext中
存在使用该值的没有使用绑定变量执行SQL语句, 调用该执行计划, 来实现调优. 这是一个特性, 包括三个阶段:
is_bind_sensitive(绑定是否敏感) : 表示该子游标中是否使用了绑定变量要素, 且采用 bind peeking 方法进行执行计划生成, 如果执行计划依赖于窥探值, 此处为Y, 否则为N. 首次硬解析, 肯定为 Y.
is_bind_aware(绑定是否可知): 表示该子游标是否使用了 extended cursor sharing 技术, 是为Y, 否则为N, 如果为N, 则该游标将废弃, 不可再用. 第二次以后, 才有可能被设置成Y.
is_bind_share(是否可共享): 表示该子游标可以被下次解析共享, 可共享为Y, 否则为N.
----------------------------------------------------------------------------
EX:
假设我们有一个表, 并且在 deptno上有100000行记录
SQL> DESCRIBE emp
Name Null? Type
---------------------- -------- ----------------------------------
ENAME VARCHAR2(20)
EMPNO NUMBER
PHONE VARCHAR2(20)
DEPTNO NUMBER
deptno=10 有99.9%的记录, 其他值有0.1%的数据
VARIABLE deptno NUMBER
EXEC :deptno := 9
SELECT /*ACS_1*/ count(*), max(empno) -- 这不是一个hint, 只是一个普通注释, 目的是方便找到这条SQL语句
FROM emp
WHERE deptno = :deptno;
以上带有绑定变量的执行语句, 优化器会计算它的cost参照以下SQL语句:
select /*ACS_1*/ count(*), max(empno)
from emp
where deptno = 9;
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR);
result:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time|
----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
| 2 | TABLE ACCESS BY INDEX ROWID| EMP | 1 | 16 | 2 (0)| 00:00:01|
|* 3 | INDEX RANGE SCAN | EMP_I1 | 1 | | 1 (0)| 00:00:01|
看上边的执行计划, 绑定变量用9的时候, 使用了索引范围扫描
同样, 你可以看看V$SQL视图:
SELECT CHILD_NUMBER, EXECUTIONS, BUFFER_GETS, IS_BIND_SENSITIVE AS "BIND_SENSI",
IS_BIND_AWARE AS "BIND_AWARE", IS_SHAREABLE AS "BIND_SHARE"
FROM V$SQL
WHERE SQL_TEXT LIKE 'select /*ACS_1%';
result:
CHILD_NUMBER EXECUTIONS BUFFER_GETS BIND_SENSI BIND_AWARE BIND_SHARE
------------ ---------- ----------- ---------- ---------- ----------
0 1 56 Y N Y
For each execution of the query with a new bind value, the database records the execution statistics for the
new value and compares them to the execution statistics for the previous value. If execution statistics vary
greatly, then the database marks the cursor bind-aware.
执行上边的SQL语句, 并且设置条件 deptno = 10
结果:
COUNT(*) MAX(EMPNO)
---------- ----------
99900 100000
Because the cursor for this statement is bind-sensitive, the optimizer assumes that the cursor can be shared.
Consequently, the optimizer uses the same index range scan for the value 10 as for the value 9.
The V$SQL output shows that the same bind-sensitive cursor was executed a second time (the query using 10)
and required many more buffer gets than the first execution:
SELECT CHILD_NUMBER, EXECUTIONS, BUFFER_GETS, IS_BIND_SENSITIVE AS "BIND_SENSI",
IS_BIND_AWARE AS "BIND_AWARE", IS_SHAREABLE AS "BIND_SHARE"
FROM V$SQL
WHERE SQL_TEXT LIKE 'select /*ACS_1%';
CHILD_NUMBER EXECUTIONS BUFFER_GETS BIND_SENSI BIND_AWARE BIND_SHARE
------------ ---------- ----------- ---------- ---------- ----------
0 2 1010 Y N Y
Now you execute the query using the value 10 a second time. The database compares statistics for previous
executions and marks the cursor as bind-aware. (第3次执行这条SQL语句, 绑定变量为9,10,10)In this case, the
the optimizer decides that a new plan is warranted, so it performs a hard parse of the statement and
generates a new plan. The new plan uses a full table scan instead of an index range scan:
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 208 (100)| |
| 1 | SORT AGGREGATE | | 1 | 16 | | |
|* 2 | TABLE ACCESS FULL| EMP | 95000 | 1484K| 208 (1)| 00:00:03 |
SELECT CHILD_NUMBER, EXECUTIONS, BUFFER_GETS, IS_BIND_SENSITIVE AS "BIND_SENSI",
IS_BIND_AWARE AS "BIND_AWARE", IS_SHAREABLE AS "BIND_SHARE"
FROM V$SQL
WHERE SQL_TEXT LIKE 'select /*ACS_1%';
CHILD_NUMBER EXECUTIONS BUFFER_GETS BIND_SENSI BIND_AWARE BIND_SHARE
------------ ---------- ----------- ---------- ---------- ----------
0 2 1010 Y N Y [deptno=9,10 index scan]
1 2 1522 Y Y Y [deptno=10, full table scan]
注意: V$SQL这个表里是同一个SQL语句产生不同的执行计划, 每个执行计划对应一行, 而不是每次执行对应一行.
接下来, 如果你再执行 deptno=9的查询, 你会看到这个表V$SQL变成了:
CHILD_NUMBER EXECUTIONS BUFFER_GETS BIND_SENSI BIND_AWARE BIND_SHARE
------------ ---------- ----------- ---------- ---------- ----------
0 2 1010 Y N N
1 1 1522 Y Y Y
2 1 7 Y Y Y
因为oracle已经找到了针对9这个值得比较合理的执行计划, 所以这里就会发生软解析, 就是第三条, 这又是一个新的执行计划.
11.2 Overview of Optimizer Access Paths
11.2.1 Full Table Scans
During a full table scan, all blocks in the table that are under the high water mark are scanned.
When Oracle Database performs a full table scan, the blocks are read sequentially. Because the blocks are adjacent,
什么时候适合 full table scan :
不能使用索引
查询表中绝大部分的值
small table : If a table contains less than [DB_FILE_MULTIBLOCK_READ_COUNT] blocks under the high water mark.
注意: oracle一次I/O可以读取多少块, DB_FILE_MULTIBLOCK_READ_COUNT这个参数控制的.
High Degree of Parallelism 高并行执行:
Full Table Scan Hints: Use the hint FULL(table alias)
11.2.2 Rowid Scans
获取单行数据的最快方式.
什么时候适合 Rowid scan:
一般, 这种情况发生在第2步, 第一步基本都是先通过索引找到rowid, 再通过rowid找到该行.
11.2.3 Index Scans
如果你要获取的列全部在索引中, 那么直接就索引返回你想要的, 不用再去根据索引值找rowid来定位该行了.
The index contains not only the indexed value, but also the rowids of rows in the table having that value.
什么时候适合 Index Scans :
Assessing I/O for Blocks, not Rows:
评估I/O情况, 而不是table里有多少行, 一般情况, 表中的记录都是集中存储到一些块的,
这种分布的情况, 有个专有名词叫index clustering factor, 这个因子低, 说明索引值相近的项都分布在一个block里或者是一起相邻,
而这个因子高, 说明索引值相近的行, 分布到非常广的block里, 这时, 使用 index range scan 扫描代价就要相对高一些.
分布集中的情况, 比如有1列上有一个非唯一索引, 这个表只有9行, 分3个block存储
Block 1 Block 2 Block 3
------- ------- -------
A A A B B B C C C
通过上图, 我们可以看到, 索引值相同的, 都在一个Block里, 如果我们通过索引查询 A, 我们只需要读取一个block, index range scan 效率高.
分布分散的情况, index range scan 效率低, 如果这时候我们想搜索A, 需要读取3个block.
Block 1 Block 2 Block 3
------- ------- -------
A B C A B C A B C
index unique scan:
一般情况下, 返回一个rowid, 一般情况下是有 unique 或者 primary key 约束, oracle才会使用 index unique scan.
注意, 我们一般不会使用 hint 针对 index unique scan.
index range scan :
一般情况下, 返回的数据会使用rowid排序, 但是你不要依赖这个, 如果你想返回的是有序的记录, 那么最好加上order by, 如果oracle发现已经通过
rowid排序和你想要的通过order by命令的排序是一致的, 那么Oracle会忽略order by, 不会带来性能方面的影响, 总之, 如果你想返回的记录是有序的,
那么你就要加上 order by 语句.
什么时候cost会使用 index range scan :
一般用在, 比如你创建一个索引有3列, A,B,C 而查询时, 你用到了列A作为条件, 注意: 一定是列A, 列B单独使用不行, 除非是 列A, 列B一起使用, 这里
有个leading columns的概念, 领导的列.
有时候我们需要使用hint来提醒oracle使用 INDEX(table_alias index_name).
Index Range Scans Descending:
与index range scan 基本一样, 只是返回的记录的顺序是, 逆序rowid.
the database uses this scan when ordering data in a descending order to return the most recent data first, or when seeking a value less than a specified value.
有时候我们需要使用hint来提醒oracle使用 INDEX_DESC(table_alias index_name)
Index Skip Scans:
跳跃索引, 比较牛逼, 可以不用首列作为索引条件, Skip scanning lets a composite index be split logically into smaller subindexes.
一般使用这种索引的条件是, 比如在2column上建立了索引, 而leading column 有比较明显的值, 比如, 男/女 等, 而你在查询时, 没有写leading column的条件, 而直接使用了第2列
作为查询的条件, 那么oracle就可能选择skip scan.
例如:
SELECT * FROM sh.customers WHERE cust_email = 'Abbey@company.com';
CREATE INDEX customers_gender_email ON sh.customers (cust_gender, cust_email);
在 cust_gender 只有两个值 M/F, 虽然上边的查询, 我们只使用了一个条件, cust_email = 'Abbey@company.com', 但是, 它实际上可以走 skip scan.
为什么走了skip索引, 实际上, 上边的查询会被替换为:
SELECT * FROM sh.customers WHERE cust_gender = 'F'
AND cust_email = 'Abbey@company.com'
UNION ALL
SELECT * FROM sh.customers WHERE cust_gender = 'M'
AND cust_email = 'Abbey@company.com';
full index scan:
使用 full index scan 的情况:
order by clause:
All of the columns in the ORDER BY clause must be in the index.
The order of the columns in the ORDER BY clause must match the order of the leading index columns.
# The ORDER BY clause can contain all of the columns in the index or a subset of the columns in the index.
The query requires a sort merge join.
All of the columns referenced in the query must be in the index.
The order of the columns referenced in the query must match the order of the leading index columns.
# The query can contain all of the columns in the index or a subset of the columns in the index.
A GROUP BY clause is present in the query, and the columns in the GROUP BY clause are present in the index.
The columns do not need to be in the same order in the index and the GROUP BY clause.
The GROUP BY clause can contain all of the columns in the index or a subset of the columns in the index.
Fast Full Index Scans:
另外一种全表扫描, 只适用于你所要的列全部在索引里, and at least one column in the index key has the NOT NULL constraint
A fast full scan accesses the data in the index itself, without accessing the table.
The database cannot use this scan to eliminate a sort operation because the data is not ordered by the index key.
The fast full scan has a special index hint, INDEX_FFS
Index Joins:
If the database uses an index join, then table access is not needed because the database can retrieve all the relevant column values from the indexes.
The database cannot use an index join to eliminate a sort operation.
You can specify an index join with the INDEX_JOIN hint.
Bitmap Indexes
A bitmap join uses a bitmap for key values and a mapping function that converts each bit position to a rowid.
11.2.7 How the Query Optimizer Chooses an Access Path
When choosing an access path, the query optimizer is influenced by the following:
Optimizer Hints
Old Statistics
11.3 Overview of Joins
11.3.1 How the Query Optimizer Executes Join Statements
To choose an execution plan for a join statement, the optimizer must make these interrelated decisions:
Access Paths
Join Method
Join Order
11.3.2 How the Query Optimizer Chooses Execution Plans for Joins
The optimizer estimates costs in the following ways:
The cost of a nested loops operation is based on the cost of reading each selected row of the outer table and each of its
matching rows of the inner table into memory. The optimizer estimates these costs using the statistics in the data dictionary.
The cost of a sort merge join is based largely on the cost of reading all the sources into memory and sorting them.
The cost of a hash join is based largely on the cost of building a hash table on one of the input sides to the join and using the rows from the other of the join to probe it.
11.3.3 Nested Loop Joins
Nested loop joins are useful when the following conditions are true:
The database joins small subsets of data.
The join condition is an efficient method of accessing the second table.
It is important to ensure that the inner table is driven from (dependent on) the outer table.
If the inner table's access path is independent of the outer table, then the same rows are retrieved for every iteration of the outer loop,
degrading performance considerably. In such cases, hash joins joining the two independent row sources perform better.'
A nested loop join involves the following steps:
1. The optimizer determines the driving table and designates it as the outer table.
2. The other table is designated as the inner table.
3. 外层每走1 row, 内存都走了一遍全表.
If the optimizer chooses to use some other join method, then you can use the USE_NL(table1 table2) hint, where table1 and table2 are the aliases of the tables being joined.
11.3.4 Hash Joins
The database uses hash joins to join large data sets. The optimizer uses the smaller of two
tables or data sources to build a hash table on the join key in memory. It then scans the larger table, probing the hash table to find the joined rows.
11.3.4.1 When the Optimizer Uses Hash Joins
A large amount of data must be joined.
A large fraction of a small table must be joined.
Apply the USE_HASH hint to instruct the optimizer to use a hash join when joining two tables together.
11.3.5 Sort Merge Joins
Sort merge joins can join rows from two independent sources. Hash joins generally perform better than sort merge joins.
However, sort merge joins can perform better than hash joins if both of the following conditions exist:
The row sources are sorted already.
A sort operation does not have to be done.
However, if a sort merge join involves choosing a slower access method (an index scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality condition such as <, <=, >, or >=.
Sort merge joins perform better than nested loop joins for large data sets. You cannot use hash joins unless there is an equality condition.
In a merge join, there is no concept of a driving table. The join consists of two steps:
1. ort join operation: Both the inputs are sorted on the join key.
2. Merge join operation: The sorted lists are merged together.
11.3.5.1 When the Optimizer Uses Sort Merge Joins
The join condition between two tables is not an equijoin.
Because of sorts required by other operations, the optimizer finds it is cheaper to use a sort merge than a hash join.
To instruct the optimizer to use a sort merge join, apply the USE_MERGE hint. You might also need to give hints to force an access path.
11.4 Reading and Understanding Execution Plans
An execution plan includes an access path for each table that the statement accesses and an ordering of the tables (the join order) with the appropriate join method.
使用 EXPLAIN PLAN, explain plan for select * from dual; 个人还是推荐直接使用工具中提供的 explain plan
另外, 要注意执行计划的步骤.

浙公网安备 33010602011771号