Efficient SQL Statements

The query optimizer uses statistics gathered on tables and indexes when determining the optimal execution plan. If these statistics have not been gathered, or if the statistics are no longer representative of the data stored within the database, then the optimizer does not have sufficient information to generate the best plan.

Things to check:

If you gather statistics for some tables in your database, then it is probably best to gather statistics for all tables. This is especially true if your application includes SQL statements that perform joins.
If the optimizer statistics in the data dictionary are no longer representative of the data in the tables and indexes, then gather new statistics. One way to check whether the dictionary statistics are stale is to compare the real cardinality (row count) of a table to the value of DBA_TABLES.NUM_ROWS. Additionally, if there is significant data skew on predicate columns, then consider using histograms.

当选择执行计划的时候，查询优化器使用在表或者索引上面收集的统计资料，如果这些统计资料没有被收集，或者这些统计资料已经不能代表在数据库中数据（即过时的），那么优化器将没有充足的信息去生成最佳的执行计划。

1、如果你为数据库中的部分表进行了统计信息的收集，那么最好对所有的表进行统计信息的收集，尤其是你的SQL语句执行了连接操作。

2、当在数据字典中统计信息不再能代表数据库中的表的数据的时候（即统计已经过时），那么重新收集统计信息，检查统计信息是否过时的一个方法是比较当前表中的实际行数与DBA_TABLES中的NUM_ROWS是否是相同的。除此之外，如果当前对于谓词列有足够交叉值的（where条件中的列），那么可以考虑使用直方图。

skew data：For table columns that contain values with large variations in number of duplicates, called skewed data，you should collect histograms。对于一列数据，他包含有很多的重复值，叫做shewed data，
这时候，收集表的统计信息，最好使用直方图进行统计信息的收集。

二、Reviewing the Execution Plan

When tuning (or writing) a SQL statement in an OLTP environment, the goal is to drive from the table that has the most selective filter. This means that there are fewer rows passed to the next step. If the next step is a join, then this means that fewer rows are joined. Check to see whether the access paths are optimal. When examining the optimizer execution plan, look for the following:
　　■ The plan is such that the driving table has the best filter.
　　■ The join order in each step means that the fewest number of rows are being returned to the next step (that is, the join order should reflect, where possible, going to the best not-yet-used filters).
　　■ The join method is appropriate for the number of rows being returned. For example, nested loop joins through indexes may not be optimal when many rows are being returned.
　　■ Views are used efficiently. Look at the SELECT list to see whether access to the view is necessary.
　　■ There are any unintentional Cartesian products (even with small tables).
　　■ Each table is being accessed efficiently:

Consider the predicates in the SQL statement and the number of rows in the table. Look for suspicious activity, such as a full table scans on tables with large number of rows, which have predicates in the where clause. Determine why an index is not used for such a selective predicate.

A full table scan does not mean inefficiency. It might be more efficient to perform a full table scan on a small table, or to perform a full table scan to leverage a better join method (for example, hash_join) for the number of rows returned.
If any of these conditions are not optimal, then consider restructuring the SQL statement or the indexes available on the tables.

当在OLTP进行SQL语句的优化或者开发，最终目标的驱动是具有选择性很高的过滤，这个意味着，又很少的行传递给下一个操作（解释：例如在进行连接的SQL语句书写的时候，假如你需要取出的是一个表中1%的数据，那么，在JOIN的时候，最好JOIN一个结果集，这个结果集内部把数据进行了过滤，举例：你需要关联一个表A，需要关联的是10部门的数据，而这个表有100000条数据，10部门只有100条，那么我们可以使用结果集把100000条数据进行过滤，然后使用结果集进行JOIN，而不需要在JOIN后面的WHERE条件添加条件DEPT_ID = 10,这样子很影响效率，写SQL语句的原则是，在进行下一个操作的时候，尽量把数据进行优先过滤掉！），如果下一个是一个连接，那么意味着，少量的行数据进行JOIN，检查执行路径是否是最优的，当检查优化器的执行路径的时候，可以检查以下几个方面：

1、执行计划的driving table 有最好的选择性；
driving table：参考网址：
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:192812348072
The 'driving' table is the table we will join FROM -- that is JOIN TO other tables.

2、每一步的连接顺序意味着更少的行返回给下一步（意味着，连接顺序能反映出哪里需要最适宜的尚未使用的过滤条件）；

3、使用适合的连接方法返回行，例如：使用索引的nest loop join对于返回大量的行是最好的；

4、有效的使用视图，查看select的列表，判断是否需要使用视图；

5、是否存在笛卡尔积的连接，即使很小的表；

6、每个表连接的都很有效；

考虑表中的行数和where条件中的过滤条件，查询一些可以的活动，例如对于一个有大量数据的表，他在where条件中有过滤条件，他为什么使用了全表扫描，查看为什么这个选择条件没有使用索引！

全表扫描并不意味着是效率低下的，对于数据比较少的表，全表扫描可能更加有效，使用全表扫描去平衡一个连接方式（HASH JOIN）来返回行。

如果上述的条件不是最优的，那么考虑去重新构建SQL语句和表中的所以!

三、Restructuring the SQL Statements

Often, rewriting an inefficient SQL statement is easier than modifying it. If you understand the purpose of a given statement, then you might be able to quickly and easily write a new statement that meets the requirement.

1、Compose Predicates Using AND and =

To improve SQL efficiency, use equijoins whenever possible. Statements that perform
equijoins on untransformed column values are the easiest to tune.

2、Avoid Transformed Columns in the WHERE Clause

对此TOM给出了一句非常经典的话：never apply a function to a database column for a search like that unless you *have* to.

Use untransformed column values. For example, use:

WHERE a.order_no = b.order_no

rather than:

WHERE TO_NUMBER (SUBSTR(a.order_no, INSTR(b.order_no, '.') - 1))
= TO_NUMBER (SUBSTR(a.order_no, INSTR(b.order_no, '.') - 1))

Do not use SQL functions in predicate clauses or WHERE clauses. Any expression using a column, such as a function having the column as its argument, causes the optimizer to ignore the possibility of using an index on that column, even a unique index, unless
there is a function-based index defined that can be used. Avoid mixed-mode expressions, and beware of implicit type conversions. When you
want to use an index on the VARCHAR2 column charcol, but the WHERE clause looks like this:

    AND charcol = numexpr

where numexpr is an expression of number type (for example, 1, USERENV('SESSIONID'), numcol, numcol+0,...), Oracle translates that expression into:

    AND TO_NUMBER(charcol) = numexpr

Avoid the following kinds of complex expressions:
　　■ col1 = NVL (:b1,col1)
　　■ NVL (col1,-999) = ….
　　■ TO_DATE(), TO_NUMBER(), and so on

These expressions prevent the optimizer from assigning valid cardinality or selectivity estimates and can in turn affect the overall plan and the join method.
Add the predicate versus using NVL() technique.
For example:

SELECT EMPLOYEE_NUM,
       FULL_NAME NAME,
       EMPLOYEE_ID
  FROM MTL_EMPLOYEES_CURRENT_VIEW
 WHERE (EMPLOYEE_NUM = NVL(:B1, EMPLOYEE_NUM))
   AND (ORGANIZATION_ID = :1)
 ORDER BY EMPLOYEE_NUM;

Also:

SELECT EMPLOYEE_NUM,
       FULL_NAME NAME,
       EMPLOYEE_ID
  FROM MTL_EMPLOYEES_CURRENT_VIEW
 WHERE (EMPLOYEE_NUM = :B1)
   AND (ORGANIZATION_ID = :1)
 ORDER BY EMPLOYEE_NUM;

When you need to use SQL functions on filters or join predicates, do not use them on the columns on which you want to have an index; rather, use them on the opposite side of the predicate, as in the following statement:
TO_CHAR(numcol) = varcol
rather than
varcol = TO_CHAR(numcol)

通常，重写一个低效的SQL比修改它容易多了，如果你了解给出的SQL语句的需求，那么可以很快的写出一个满足要求的SQL语句。

组合条件使用AND和=（尽量避免使用OR和LIKE等）

为了提高SQL的效率，在任何情况下，使用等值连接，执行等值连接的非转换语句是非常容易调整的（untransformed column是指在这一列为使用函数：例如没有使用类似NVL(列名)）。

避免转换列出现在WHERE条件中（尽量不要再列上面使用函数）

如下使用为转换的列：

 WHERE a.order_no = b.order_no

而不是：

WHERE TO_NUMBER (SUBSTR(a.order_no, INSTR(b.order_no, '.') - 1))
= TO_NUMBER (SUBSTR(a.order_no, INSTR(b.order_no, '.') - 1))

不要在谓词（即in/like/>/<等操作符，网址：http://en.wikipedia.org/wiki/Where_(SQL)#Predicates）语句或者WHERE条件中使用函数，使用任何表达式的列，例如一个函数将一列作为它的参数，将会导致优化器忽略使用该列上面存在的索引，即使是一个等值的索引，除非这里定义了一个函数索引，才可以使用索引。

对于上述的结论，TOM曾经说过一句话：do not call plsql from sql unless you have to.在SQL中不要调用PL/SQL，除非你必须这么做，在使用游标的时候，尽量使用FOR游标，避免使用显式游标
网址：http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:1448404423206

在这里，对于数学表达式：USERENV('SESSIONID'), numcol, numcol+0,...),Oracle将把它们转换成

  AND TO_NUMBER(charcol) = numexp

避免使用以下各种合成的表达式：

1、col1 = NVL (:b1,col1)
2、NVL (col1,-999) = ….
3、TO_DATE(), TO_NUMBER(), and so on

上述的表达式，将会阻止优化器分配有效的基数和估计语句的选择度，并且会影响最终的执行计划和连接方式。

使用谓词和NVL进行对比：下面是使用了NVL的SQL

1 SELECT EMPLOYEE_NUM,
2        FULL_NAME NAME,
3        EMPLOYEE_ID
4   FROM MTL_EMPLOYEES_CURRENT_VIEW
5  WHERE (EMPLOYEE_NUM = NVL(:B1, EMPLOYEE_NUM))
6    AND (ORGANIZATION_ID = :1)
7  ORDER BY EMPLOYEE_NUM;

使用谓词：

SELECT EMPLOYEE_NUM,
       FULL_NAME NAME,
       EMPLOYEE_ID
  FROM MTL_EMPLOYEES_CURRENT_VIEW
 WHERE (EMPLOYEE_NUM = :B1)
   AND (ORGANIZATION_ID = :1)
 ORDER BY EMPLOYEE_NUM;

当你需要在过滤器或者连接的谓词中使用函数，那么不要在包含索引的列上面使用，在它们相对的一面使用，例如，

varcol = TO_CHAR(numcol)

而是

TO_CHAR(numcol) = varcol

因为后面的写法可以使用函数索引

Write Separate SQL Statements for Specific Tasks

SQL is not a procedural language. Using one piece of SQL to do many different things usually results in a less-than-optimal result for each task. If you want SQL to accomplish different things, then write various statements, rather than writing one statement to do different things depending on the parameters you give it.

It is always better to write separate SQL statements for different tasks, but if you must use one SQL statement, then you can make a very complex statement slightly less complex by using the UNION ALL operator.
Optimization (determining the execution plan) takes place before the database knows what values will be substituted into the query. An execution plan cannot, therefore, depend on what those values are. For example:

1 SELECT info
2   FROM tables
3  WHERE .. .
4     AND somecolumn BETWEEN decode(:loval, 'ALL', somecolumn, :loval) AND
5         decode(:hival, 'ALL', somecolumn, :hival);

Written as shown, the database cannot use an index on the somecolumn column, because the expression involving that column uses the same column on both sides of the BETWEEN.
This is not a problem if there is some other highly selective, indexable condition you can use to access the driving table. Often, however, this is not the case. Frequently, you might want to use an index on a condition like that shown but need to know the values of :loval, and so on, in advance. With this information, you can rule out the ALL case, which should not use the index.If you want to use the index whenever real values are given for :loval and :hival (if you expect narrow ranges, even ranges where :loval often equals :hival), then you can rewrite the example in the following logically equivalent form:

 1 SELECT /* change this half of UNION ALL if other half changes */
 2  info
 3   FROM tables
 4  WHERE .. .
 5     AND somecolumn BETWEEN :loval AND :hival
 6     AND (:hival != 'ALL' AND :loval != 'ALL')
 7 UNION ALL
 8 SELECT /* Change this half of UNION ALL if other half changes. */
 9  info
10   FROM tables
11  WHERE .. .
12     AND (:hival = 'ALL' OR :loval = 'ALL');

If you run EXPLAIN PLAN on the new query, then you seem to get both a desirable and an undesirable execution plan. However, the first condition the database evaluates for either half of the UNION ALL is the combined condition on whether :hival and :loval are ALL. The database evaluates this condition before actually getting any rows from the execution plan for that part of the query. When the condition comes back false for one part of the UNION ALL query, that part is not evaluated further. Only the part of the execution plan that is optimum for the values provided is actually carried out. Because the final conditions on :hival and :loval are guaranteed to be mutually exclusive, only one half of the UNION ALL actually returns rows. (The ALL in UNION ALL is logically valid because of this exclusivity. It allows the plan to be carried out without an expensive sort to rule out duplicate rows for the two halves of the query.)

为单独的任务写单独的SQL语句

SQL不是过程化的语句，使用一个SQL语句完成一系列的事情，经常会导致每一个任务的结果都不是最佳的，如果你想使用SQL语句完成多件事情，那么写各个情况的SQL语句，而不是根据给定的不同参数来决定完成不同的事情。

通常为不同的任务写不同的SQL语句是最佳的，但是假如你必须使用一个SQL语句，那么你可以使用UNION ALL来将一个非常复杂的语句变得更加简单。

优化器（决定执行计划）发生在具体的值将被使用在SQL语句之前，一个执行计划是不受具体的值影响的，例如：

1 SELECT info
2   FROM tables
3  WHERE .. .
4     AND somecolumn BETWEEN decode(:loval, 'ALL', somecolumn, :loval) AND
5         decode(:hival, 'ALL', somecolumn, :hival);

如上所示，数据库将不会使用在somecolumn列上的索引，因为在BETWEEN的两端对该列使用了表达式。

这可能不是一个问题，假如其他的条件具有很高的选择度，索引条件你可以使用去关联driving table（驱动表），通常这不是一个问题，通常你可能需要在不知道：loval的具体的值时候，使用索引，在这种情况下，你可以去掉ALL语句，因为他不会使用索引。

如果，你想在无论给定：loval和：hival什么值的情况下，使用索引（即使在范围很小的情况下，例如loval等于hival的值），那么你可以使用如下的模式去写SQL语句

 1 SELECT /* change this half of UNION ALL if other half changes */
 2  info
 3   FROM tables
 4  WHERE .. .
 5     AND somecolumn BETWEEN :loval AND :hival
 6     AND (:hival != 'ALL' AND :loval != 'ALL')
 7 UNION ALL
 8 SELECT /* Change this half of UNION ALL if other half changes. */
 9  info
10   FROM tables
11  WHERE .. .
12     AND (:hival = 'ALL' OR :loval = 'ALL');

如果对新的语句使用执行计划，那么你能一个想要的和一个不想要的执行计划，对于union all的两个部分，数据库执行的第一个检查条件都是loval和hival是否等于all，对于每一部分语句，使用执行计划，返回行之前，Oracle已经对行进行了判断。

当union all的一部分的条件判断后，假如不成立，那么这一部分将不再继续的执行，由于loval和hival是人工手动的进行赋值，那么union all实际上只有一半可以返回结果集，因为ALL在union all中是排他性的，这将使执行计划不需要对两部分结果集进行昂贵的排序而去除重复的结果集。

Controlling the Access Path and Join Order with Hints

You can influence the optimizer's choices by setting the optimizer approach and goal, and by gathering representative statistics for the query optimizer. Sometimes, the application designer, who has more information about a particular application's data than is available to the optimizer, can choose a more effective way to execute a SQL statement. You can use hints in SQL statements to instruct the optimizer about how the statement should be executed. Hints, such as /*+FULL */ control access paths. For example:

SELECT /*+ FULL(e) */
 e.last_name
  FROM employees e
 WHERE e.job_id = 'CLERK';

Join order can have a significant effect on performance. The main objective of SQL tuning is to avoid performing unnecessary work to access rows that do not affect the result. This leads to three general rules:
　　■ Avoid a full-table scan if it is more efficient to get the required rows through an index.
　　■ Avoid using an index that fetches 10,000 rows from the driving table if you could instead use another index that fetches 100 rows.
　　■ Choose the join order so as to join fewer rows to tables later in the join order.
The following example shows how to tune join order effectively:

SELECT info
  FROM taba a,
       tabb b,
       tabc c
 WHERE a.acol BETWEEN 100 AND 200
   AND b.bcol BETWEEN 10000 AND 20000
   AND c.ccol BETWEEN 10000 AND 20000
   AND a.key1 = b.key1
   AND a.key2 = c.key2;

1. Choose the driving table and the driving index (if any).

The first three conditions in the previous example are filter conditions applying to only a single table each. The last two conditions are join conditions.

Filter conditions dominate the choice of driving table and index. In general, the driving table is the one containing the filter condition that eliminates the highest percentage of the table. Thus, because the range of 100 to 200 is narrow compared with the range of acol, but the ranges of 10000 and 20000 are relatively large, taba is the driving table, all else being equal.

With nested loop joins, the joins all happen through the join indexes, the indexes on the primary or foreign keys used to connect that table to an earlier table in the join tree. Rarely do you use the indexes on the non-join conditions, except for the driving table. Thus, after taba is chosen as the driving table, use the indexes on b.key1 and c.key2 to drive into tabb and tabc, respectively.
2. Choose the best join order, driving to the best unused filters earliest.

The work of the following join can be reduced by first joining to the table with the best still-unused filter. Thus, if "bcol BETWEEN ..." is more restrictive (rejects a higher percentage of the rows seen) than "ccol BETWEEN ...", the last join can be made easier (with fewer rows) if tabb is joined before tabc.

3. You can use the ORDERED or STAR hint to force the join order.

Use Caution When Managing Views

Be careful when joining views, when performing outer joins to views, and when reusing an existing view for a new purpose.

Use Caution When Joining Complex Views Joins to complex views are not recommended, particularly joins from one complex view to another. Often this results in the entire view being instantiated, and then the query is run against the view data.

For example, the following statement creates a view that lists employees and departments：

CREATE OR REPLACE VIEW emp_dept
AS
SELECT d.department_id,
       d.department_name,
       d.location_id,
       e.employee_id,
       e.last_name,
       e.first_name,
       e.salary,
       e.job_id
  FROM departments d,
       employees   e
 WHERE e.department_id(+) = d.department_id;

The following query finds employees in a specified state:

SELECT v.last_name,
       v.first_name,
       l.state_province
  FROM locations l,
       emp_dept  v
 WHERE l.state_province = 'California'
   AND v.location_id = l.location_id(+);

In the following plan table output, note that the emp_dept view is instantiated:

Do Not Recycle Views

Beware of writing a view for one purpose and then using it for other purposes to which it might be ill-suited. Querying from a view requires all tables from the view to be accessed for the data to be returned. Before reusing a view, determine whether all tables in the view need to be accessed to return the data. If not, then do not use the view. Instead, use the base table(s), or if necessary, define a new view. The goal is to refer to the minimum number of tables and views necessary to return the required data. Consider the following example:

SELECT department_name FROM emp_dept WHERE department_id = 10;

The entire view is first instantiated by performing a join of the employees and departments tables and then aggregating the data. However, you can obtaindepartment_name and department_id directly from the departments table. It is inefficient to obtain this information by querying the emp_dept view.

Use Caution When Unnesting Subqueries Subquery unnesting merges the body of the subquery into the body of the statement that contains it, allowing the optimizer to consider them together when evaluating access paths and joins.

Use Caution When Performing Outer Joins to Views In the case of an outer join to a multi-table view, the query optimizer (in Release 8.1.6 and later) can drive from an outer join column, if an equality predicate is defined on it.

An outer join within a view is problematic because the performance implications of the outer join are not visible.

Store Intermediate Results

Intermediate, or staging, tables are quite common in relational database systems, because they temporarily store some intermediate results. In many applications they are useful, but Oracle requires additional resources to create them. Always consider whether the benefit they could bring is more than the cost to create them. Avoid staging tables when the information is not reused multiple times. Some additional considerations:
　　■ Storing intermediate results in staging tables could improve application performance. In general, whenever an intermediate result is usable by multiple following queries, it is worthwhile to store it in a staging table. The benefit of not retrieving data multiple times with a complex statement already at the second usage of the intermediate result is better than the cost to materialize it.
　　■ Long and complex queries are hard to understand and optimize. Staging tables can break a complicated SQL statement into several smaller statements, and then store the result of each step.
　　■ Consider using materialized views. These are precomputed tables comprising aggregated or joined data from fact and possibly dimension tables.

使用hint控制访问路径和连接顺序

可以通过设置优化器的路径和目标来影响优化器的选择，并且可以为优化器收集一些代表性的统计信息，有时，程序的设计者可能比优化器有更多有用的信息，可以选择一个高效的方式，去执行SQL语句，可以使用hint去指导优化器如何去执行SQL语句。

　hint，像/*+FULL */控制访问路径，例如：

1 SELECT /*+ FULL(e) */
2  e.last_name
3   FROM employees e
4  WHERE e.job_id = 'CLERK';

连接次序对性能有非常显著的影响，SQL优化的主要目的就是消除一些不必要的访问结果集的操作，他不会影响最终结果
他有三个原则：

1、如果通过索引去访问行数据是更加高效的话，那么避免使用全表扫描。

2、避免出现使用一个索引获取超过10000行的记录，相反的可以使用另一个索引获取100行记录的情况

3、当选择连接顺序的时候，应该将数据集最少的表放在最后连接。

下面的实例展示了如何有效的优化表连接的次序：

SELECT info
  FROM taba a,
       tabb b,
       tabc c
 WHERE a.acol BETWEEN 100 AND 200
   AND b.bcol BETWEEN 10000 AND 20000
   AND c.ccol BETWEEN 10000 AND 20000
   AND a.key1 = b.key1
   AND a.key2 = c.key2;

1、选择driving table 和driving index（如果有必要的话）

在上面的SQL语句出现的前三个条件，都是应用在每个表上的过滤条件，后面两个条件是连接条件，（Filter Condition）过滤条件决定了driving table和driving index的选择，一般情况下，Driving table是包含一个可以除去表中大部分数据的条件的表，因而，100-200这个范围和acol相比是一个很小的范围，相比之下，10000-20000是一个很大的范围，所以taba是driving table，其他的都一样。

Driving Table的例子，（Tom给出的解释）例如：

SELECT ename,
       dname
  FROM emp,
       dept
 WHERE emp.deptno = dept.deptno;

可以理解为如下的方式：

Here we will use DEPT as the driving table, the psuedo code for that query plan is，下面是伪代码：

for x in ( select * from dept ) /* table access (full) dept -- driving table */
loop
   do an index lookup to find matching rows in emp
   for each row found in that index lookup
         output data
   end loop
end loop

对于内循环的连接，连接都发生在连接索引上，这个索引是用来在连接树中连接前面的表的主键或者外键。除了Driving table之外，很少有情况使用索引进行不等值的连接，因而，当taba被选择为driving table之后，分别使用索引bkey1和ckey2去分别的驱动表tabb和tabc。

2、选择最佳的连接顺序，使用最佳的未连接的过滤器

可以使用尚未使用的过滤器中最佳的过滤器可以减少之后连接的活动，因而，如果b.bcol BETWEEN 比c.ccol BETWEEN更加严格（拒绝返回大比例的行：选择度较高），如果tabb在tabc之前连接，那么最后一个连接会变得更加容易。

3、可以使用ordered 和 stat的hint来强制的执行连接顺序。

当使用视图的时候，要小心

当连接视图的时候、外联接一个视图，使用已经存在的视图用做其他的事的时候，需要小心。

1、当连接复杂视图的时候，需要小心

不推荐连接一个复杂的视图，尤其是一个复杂的视图连接另一个复杂的视图，通常视图中的结果集已经被实例化了，随后的查询是不需要和这些数据和不相符的。
例如，如下的视图：

列出了员工和部门信息:

CREATE OR REPLACE VIEW emp_dept
AS
SELECT d.department_id,
       d.department_name,
       d.location_id,
       e.employee_id,
       e.last_name,
       e.first_name,
       e.salary,
       e.job_id
  FROM departments d,
       employees   e
 WHERE e.department_id(+) = d.department_id;

下面的语句查询员工所在的具体的州：

SELECT v.last_name,
       v.first_name,
       l.state_province
  FROM locations l,
       emp_dept  v
 WHERE l.state_province = 'California'
   AND v.location_id = l.location_id(+);

注意下面的执行计划，其中视图需要被初始化：

2、不要重复利用视图

注意：当写了一个视图用于一个目的，之前把它又作为另一个目的，这个是不合适的，查询一个视图的时候，为了数据的返回，需要访问视图中的每一个表，在重新利用视图之前，考虑，是否视图中的每个表都需要访问用来返回数据，如果不是这样子的话，那么不要重复利用这个视图，相反的，使用最原始的表，或者新建一个视图，我们的目的是使用最少数量的表或者必要的视图，返回需要的数据。
考虑如下的情况：

SELECT department_name FROM emp_dept WHERE department_id = 10;

当执行employees表和departments表的连接的时候，整个视图就初始化了，然后聚合数据，然后，可以直接从departments表中获取department_name和department_id这两列，上述的SQL语句通过视图查询，获取结果集是低效的。

3、当使用关联子查询的时候，要小心

子查询解嵌套将子查询合并在主查询中语句中，使用优化器选择执行路径的时候，将它们考虑为一个整体。

网址：

http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/queries008.htm

http://caohong286.iteye.com/blog/1370976

http://space.itpub.net/27064837/viewspace-746501

4、当外连接一个视图的时候，要小心

在一个连接多个表关联的视图中的情况下，如果定义了一个=的谓词连接的话查询优化器（从8.1.6开始的版本）可以使用外连接的一列来进行驱动。

在视图中，使用外连接是有问题的，因为外连接的性能是不可见的，举例说明：存在视图X，它是由A left join B构成，现在Y表 left join A视图，那么对于优化器来说，B和A的性能是不可见的（通过视图合并可以 merge view）

Restructuring the Indexes

Often, there is a beneficial impact on performance by restructuring indexes. This can involve the following:
　　■ Remove nonselective indexes to speed the DML.
　　■ Index performance-critical access paths.
　　■ Consider reordering columns in existing concatenated indexes.
　　■ Add columns to the index to improve selectivity.
Do not use indexes as a panacea. Application developers sometimes think that performance will improve if they create more indexes. If a single programmer creates an appropriate index, then this might indeed improve the application's performance. However, if 50 programmers each create an index, then application performance will probably be hampered.

重新组织索引

通常，重新组织索引会对性能有一个显著的影响，主要包含以下几个方面：

去除选择度低的索引，加快DML操作的速度
索引至关重要的性能的访问路径
考虑重新组织存在的索引列的顺序
向索引中增加列提高选择度

不要将索引认为是万能药，程序开发者经常认为如果它们建立更多的索引的话，性能将会提高，如果一个程序员创建了一个合适的索引，那么这可能确实会提高程序的性能，但是，如果50个程序员，每个都创建一个索引，那么性能可能会下降。

Modifying or Disabling Triggers and Constraints

Using triggers consumes system resources. If you use too many triggers, then you can find that performance is adversely affected and you might need to modify or disable them.

修改或者禁用触发器和索引

使用触发器会消耗系统资源，如果你使用了过多的触发器，那么你会发现性能会受到不利的影响，你可能需要考虑修改或者禁止这些触发器或者约束。

Restructuring the Data

After restructuring the indexes and the statement, consider restructuring the data:
■ Introduce derived values. Avoid GROUP BY in response-critical code.
■ Review your data design. Change the design of your system if it can improve performance.
■ Consider partitioning, if appropriate.

重新组织数据

当重新的组织了索引和SQL语句后，考虑重新的组织数据。

使用驱动值，避免在需要快速反应的代码中使用GROUP BY
重新审视数据设计，如果可以提高性能，那么重新修改你的系统设计
在合适的时候，考虑对表进行分区

Maintaining Execution Plans Over Time

You can maintain the existing execution plan of SQL statements over time either using stored statistics or SQL plan baselines. Storing optimizer statistics for tables will apply to all SQL statements that refer to those tables. Storing an execution plan as a SQL plan baseline maintains the plan for set of SQL statements. If both statistics and a SQL plan baseline are available for a SQL statement, the optimizer will first use a cost-based search method to build a best-cost plan, then it will try to find a matching plan in the SQL plan baseline. If a match is found, the optimizer will proceed using this plan. Otherwise, it will evaluate the cost of each of the accepted plans in the SQL plan baseline and select the plan with the lowest cost.

不断的去维护执行计划

你可以使用存储的统计信息和SQL计划的基线（固定执行计划）去不断维护已经存在的SQL语句的执行计划，对于表的优化统计方面的信息将对任何引用了该表的SQL语句生效，不断的为SQL语句集维护执行计划作为它的SQL语句的固定执行计划，优化器将会首先使用基于成本的搜索方法去建立一个最好的执行计划。随后将会在SQL执行计划的固定执行计划中找一个匹配的计划，如果匹配成功，那么优化器将会使用这个计划进行处理，否则，他将会对每个接受了的固定执行计划的执行计划和刚刚基于成本的搜索方法去建立的执行计划进行评估（并将它加入到固定执行计划中），并选择成本最低的执行计划。

SQL plan baselines:http://nyoug.org/Presentations/2011/March/Malik_SQL_Plan_Management.pdf

A SQL plan baselines are set of all accepted plans in the plan history that the database maintains for each repeatable SQL statement that the database executes. The database defines a plan as acceptable when it verifies that the plan doesn't lead to performance regression when compared to the other plan in the plan history. The very first execution plan the database generates for a SQL statement is always considered acceptable by the optimizer and becomes the original SQL plan baseline as well as the plan history for the statement. Later execution plans will not be included in the SQL baseline unless the database verifies that they don't lead to performance regression。

Visiting Data as Few Times as Possible

　　Applications should try to access each row only once. This reduces network traffic and reduces database load. Consider doing the following:
　　　　■ Combine Multiples Scans with CASE Statements
　　　　■ Use DML with RETURNING Clause
　　　　■ Modify All the Data Needed in One Statement

Combine Multiples Scans with CASE Statements

Often, it is necessary to calculate different aggregates on various sets of tables. Usually, this is done with multiple scans on the table, but it is easy to calculate all the aggregates with one single scan. Eliminating n-1 scans can greatly improve performance.

Combining multiple scans into one scan can be done by moving the WHERE condition of each scan into a CASE statement, which filters the data for the aggregation. For each aggregation, there could be another column that retrieves the data. The following example asks for the count of all employees who earn less then 2000, between 2000 and 4000, and more than 4000 each month. This can be done with three separate queries:

SELECT COUNT(*) FROM employees WHERE salary < 2000;
SELECT COUNT(*) FROM employees WHERE salary BETWEEN 2000 AND 4000;
SELECT COUNT(*) FROM employees WHERE salary > 4000;

However, it is more efficient to run the entire query in a single statement. Each number is calculated as one column. The count uses a filter with the CASE statement to count only the rows where the condition is valid. For example:

 1 SELECT COUNT(CASE
 2                WHEN salary < 2000 THEN
 3                 1
 4                ELSE
 5                 NULL
 6              END) count1,
 7        COUNT(CASE
 8                WHEN salary BETWEEN 2001 AND 4000 THEN
 9                 1
10                ELSE
11                 NULL
12              END) count2,
13        COUNT(CASE
14                WHEN salary > 4000 THEN
15                 1
16                ELSE
17                 NULL
18              END) count3
19   FROM employees;

This is a very simple example. The ranges could be overlapping, the functions for the aggregates could be different, and so on.

Use DML with RETURNING Clause

　　When appropriate, use INSERT, UPDATE, or DELETE... RETURNING to select and modify data with a single call. This technique improves performance by reducing the number of calls to the database.

Modify All the Data Needed in One Statement

When possible, use array processing. This means that an array of bind variable values is passed to Oracle for repeated execution. This is appropriate for iterative processes in which multiple rows of a set are subject to the same operation.
For example:

1 BEGIN
2   FOR pos_rec IN (SELECT * FROM order_positions WHERE order_id = :id) LOOP
3     DELETE FROM order_positions
4      WHERE order_id = pos_rec.order_id
5        AND order_position = pos_rec.order_position;
6   END LOOP;
7   DELETE FROM orders WHERE order_id = :id;
8 END;

Alternatively, you could define a cascading constraint on orders. In the previous example, one SELECT and n DELETEs are executed. When a user issues the DELETE on orders DELETE FROM orders WHERE order_id = :id, the database automatically deletes the positions with a single DELETE statement.

尽可能少的去访问数据

应用程序应该对每行数据只访问一次，可以减少网络流量和数据库的负载，考虑如下的几个例子：

1、将多次扫描使用case when语句代替

2、在DML语句中使用returning 子句

3、在一个语句中修改所有要修改的数据

将多次扫描使用case when语句代替

通常，需要计算一个表的多种结果集的不同的聚合值，通常，我们会多次扫描表，但是我们可以很容易的使用一个语句来计算出它们的聚合值，减少了n-1的扫描表，提高了性能。

将多次扫描合并成一次扫描需要将where语句中的条件写到每个case语句中，这个case语句用来过滤聚合数据，对于每个聚合值，都需要从不同的列中提取数据。

如下的SQL语句，需要求出多个区间范围内的数据：

SELECT COUNT(*) FROM employees WHERE salary < 2000;
SELECT COUNT(*) FROM employees WHERE salary BETWEEN 2000 AND 4000;
SELECT COUNT(*) FROM employees WHERE salary > 4000;

然而，将上述的语句合并到一个语句中是非常高效的，每个数据都从不同的列聚合而来，count语句使用case中过滤条件计算满足条件的行数据。

SELECT COUNT(CASE
               WHEN salary < 2000 THEN
                1
               ELSE
                NULL
             END) count1,
       COUNT(CASE
               WHEN salary BETWEEN 2001 AND 4000 THEN
                1
               ELSE
                NULL
             END) count2,
       COUNT(CASE
               WHEN salary > 4000 THEN
                1
               ELSE
                NULL
             END) count3
  FROM employees;

这是一个非常简单的例子，他的区间范围还可以发生重叠，聚合的函数可以不同例如，使用sum

在DML语句中使用RETURING子句

在合适的时候，使用INSERT/UPDATE/DELELE...RETURING子句用于在一个语句中选择或修改数据，这种方法通过减少与数据库的交互而提高性能。

在一个语句中修改所有要修改的数据

在可能的情况下，使用结合处理数据，这意味着在集合的每个绑定变量的值重复的传递给数据库执行，这适合于循环的对一个结果集进行处理。

例如；

BEGIN
  FOR pos_rec IN (SELECT * FROM order_positions WHERE order_id = :id) LOOP
    DELETE FROM order_positions
     WHERE order_id = pos_rec.order_id
       AND order_position = pos_rec.order_position;
  END LOOP;
  DELETE FROM orders WHERE order_id = :id;
END;

可以在orders表中定义一个约束，当orders表的数据删除后，系统自动的把另外一个表的数据删除。

posted on 2012-11-18 11:38 Coldest Winter 阅读(365) 评论(0) 收藏举报