代码改变世界

Oracle面对“数据倾斜列使用绑定变量”场景的解决方案

2019-08-27 00:32  AlfredZhao  阅读(...)  评论(... 编辑 收藏

1.背景知识介绍

    我们知道,Oracle在传统的OLTP(在线事务处理)类系统中,强烈推荐使用绑定变量,这样可以有效的减少硬解析从而增加系统的并发处理能力。甚至在有些老旧系统,由于在开始开发阶段缺乏认识没有使用到绑定变量,后期并发量增长且无法改造程序时,运维DBA还会不得已去设置cursor_sharing=force来强制使用系统的绑定变量(这是一个万不得已的方案,并不是最佳实践)。

    虽然使用绑定变量给OLTP系统带来了巨大的好处,但也同时带来一些棘手的问题,最典型的就是由于SQL文本中包含绑定变量,优化器无法知道绑定变量代表的具体值,只能使用默认的可选择率,这就可能导致由于无法准确判断值的可选择率而造成选择错误的执行计划。Oracle在9i时代就有了针对这个问题的解决方案,即绑定变量窥探(bind peeking)特性。开启该特性的情况下,当遇到有绑定变量的SQL,在其第一次硬解析时,优化器会窥探真实的值从而准确判断可选择率(selectivity),最终选择正确的执行计划。可是该特性同时又引入另一个棘手的问题,因为在第一次硬解析之后就都是软/软软解析,所以也就不会再次窥探绑定变量的真实值,而如果该值所在字段本身数值比例就分布不均,就极可能导致性能问题(尤其是如果第一次窥探的值代表了少数情况,那问题就会更加严重),所以一直以来,虽然Oracle默认是开启这个特性的,但很多的客户生产环境最佳实践都将这个特性给关闭了。

    直到Oracle 11g的时代,才推出了acs(adaptive_cursor_sharing)特性,配合bind peeking才算真正意义上解决了这个问题。不过也不够完美,因为acs特性本身也的确会增加额外的硬解析,且会导致child cursor增多,从而软解析扫描chain的时间变长,同时对shared pool空间需求也增加,且早期bug较多,即使Oracle默认也是开启这个特性的,很多客户生产环境也是将其关闭的。

    在这种背景下,咨询了公司SQL优化专家赵勇,建议是当遇到在数据倾斜的列上使用绑定变量的情况,应该及时与开发沟通,能否在这类数据分布严重倾斜的列上不用绑定变量,若该列上的值很多,不用绑定变量可能导致大量的硬解析的话,还可在应用发出SQL前,先判断其传入的值,是否是非典型值,若不是的话,使用非绑定变量的SQL;若是典型值,则使用绑定变量的语句。

    如果是不能改应用的情况呢?我目前能想到的是要么牺牲非典型值的执行效率(防止非典型值先被窥探导致更严重的性能后果,可以按典型值的执行计划绑定);要么是干脆尝试同时打开bind peeking和acs特性,实际测试验证能否解决问题同时不引起其他性能问题(如果是已经关闭这些特性的生产系统,开启还是要慎重测试后决定)。

2.构造测试用例

    下面构造一个简单的测试用例来说明Oracle在这种场景下提供的解决方案(bind peeking + acs):

--建表T_SKEW,构造出严重的数据倾斜:
create table jingyu.t_skew as select * from dba_objects;
create index jingyu.idx_t_skew on jingyu.t_skew(object_id);
update jingyu.t_skew set object_id=3 where object_id>3;
commit;

--查看数据列OBJECT_ID的倾斜程度:
select object_id, count(*) from jingyu.t_skew group by object_id;

 OBJECT_ID   COUNT(*)
---------- ----------
         2          1
         3      86412

--收集统计信息:
exec dbms_stats.gather_table_stats('JINGYU','T_SKEW');

--查看列OBJECT_ID的直方图信息:
select owner, table_name, column_name, histogram from dba_tab_col_statistics where table_name = 'T_SKEW' and column_name = 'OBJECT_ID';
OWNER                                                        Name            Name                      HISTOGRAM
------------------------------------------------------------ --------------- ------------------------- ------------------------------
JINGYU                                                       T_SKEW          OBJECT_ID                 FREQUENCY

    使用MOS:SCRIPT - Select to show Optimizer Statistics for CBO (文档 ID 31412.1) 提供的脚本查询信息:

SQL> @sosi
SQL> set echo off

Please enter Name of Table Owner (Null = SYS): jingyu
Please enter Table Name to show Statistics for: t_skew

***********
Table Level
***********


Table                       Number                 Empty Average    Chain Average Global User               Sample Date
Name                       of Rows   Blocks       Blocks   Space    Count Row Len Stats  Stats                Size MM-DD-YYYY
--------------- ------------------ -------- ------------ ------- -------- ------- ------ ------ ------------------ ----------
T_SKEW                      86,413    1,262            0       0        0      96 YES    NO                 86,413 08-26-2019

Column                    Column                       Distinct          Number     Number Global User               Sample Date
Name                      Details                        Values Density Buckets      Nulls Stats  Stats                Size MM-DD-YYYY
------------------------- ------------------------ ------------ ------- ------- ---------- ------ ------ ------------------ ----------
OWNER                     VARCHAR2(30)                       27       0       1          0 YES    NO                 86,413 08-26-2019
OBJECT_NAME               VARCHAR2(128)                  51,864       0       1          0 YES    NO                 86,413 08-26-2019
SUBOBJECT_NAME            VARCHAR2(30)                       87       0       1     86,152 YES    NO                    261 08-26-2019
OBJECT_ID                 NUMBER(22)                          2       0       2          0 YES    NO                  5,389 08-26-2019
DATA_OBJECT_ID            NUMBER(22)                      8,670       0       1     77,703 YES    NO                  8,710 08-26-2019
OBJECT_TYPE               VARCHAR2(19)                       44       0       1          0 YES    NO                 86,413 08-26-2019
CREATED                   DATE                              904       0       1          0 YES    NO                 86,413 08-26-2019
LAST_DDL_TIME             DATE                              995       0       1          0 YES    NO                 86,413 08-26-2019
TIMESTAMP                 VARCHAR2(19)                    1,036       0       1          0 YES    NO                 86,413 08-26-2019
STATUS                    VARCHAR2(7)                         2       1       1          0 YES    NO                 86,413 08-26-2019
TEMPORARY                 VARCHAR2(1)                         2       1       1          0 YES    NO                 86,413 08-26-2019
GENERATED                 VARCHAR2(1)                         2       1       1          0 YES    NO                 86,413 08-26-2019
SECONDARY                 VARCHAR2(1)                         2       1       1          0 YES    NO                 86,413 08-26-2019
NAMESPACE                 NUMBER(22)                         20       0       1          0 YES    NO                 86,413 08-26-2019
EDITION_NAME              VARCHAR2(30)                        0       0       0     86,413 YES    NO                        08-26-2019

                              B                                            Average     Average
Index                      Tree Leaf       Distinct             Number Leaf Blocks Data Blocks      Cluster Global User               Sample Date
Name            Unique    Level Blks           Keys            of Rows     Per Key     Per Key       Factor Stats  Stats                Size MM-DD-YYYY
--------------- --------- ----- ---- -------------- ------------------ ----------- ----------- ------------ ------ ------ ------------------ ----------
IDX_T_SKEW      NONUNIQUE     1  298              2             86,413         149         617        1,234 YES    NO                 86,413 08-26-2019

Index           Column                     Col Column
Name            Name                       Pos Details
--------------- ------------------------- ---- ------------------------
IDX_T_SKEW      OBJECT_ID                    1 NUMBER(22)

***************
Partition Level
***************

***************
SubPartition Level
***************
SQL> 

3.场景测试

3.1 首先确认bind_peeking和acs都是开启状态

--查询隐藏参数:
set linesize 333
col name for a35
col description for a66
col value for a30
SELECT   i.ksppinm name,  
   i.ksppdesc description,  
   CV.ksppstvl VALUE
FROM   sys.x$ksppi i, sys.x$ksppcv CV  
   WHERE   i.inst_id = USERENV ('Instance')  
   AND CV.inst_id = USERENV ('Instance')  
   AND i.indx = CV.indx  
   AND i.ksppinm LIKE '%&param%' 
ORDER BY   REPLACE (i.ksppinm, '_', '');  

--相关隐藏参数的默认值(表示bind_peeking和acs都是开启的):
NAME                                DESCRIPTION                                                        VALUE
----------------------------------- ------------------------------------------------------------------ ------------------------------
_optim_peek_user_binds              enable peeking of user binds                                       TRUE
_optimizer_adaptive_cursor_sharing  optimizer adaptive cursor sharing                                  TRUE
_optimizer_extended_cursor_sharing  optimizer extended cursor sharing                                  UDO
_optimizer_extended_cursor_sharing_ optimizer extended cursor sharing for relational operators         SIMPLE
rel

3.2 场景测试用例和测试结果

--1)场景测试用例
alter session set current_schema=jingyu;
alter session set statistics_level=all;
set lines 200 pages 200

var v1 number;
exec :v1 := 2;
select count(*) from t_skew where object_id = :v1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

exec :v1 := 3;
select count(*) from t_skew where object_id = :v1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

select count(*) from t_skew where object_id = :v1;
select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

--2)场景测试结果
SQL> alter system flush shared_pool;
SQL> alter session set current_schema=jingyu;
SQL> alter session set statistics_level=all;
SQL> set lines 200 pages 200
SQL> 
--绑定变量值为2,第一次执行,采用INDEX RANGE SCAN的执行计划,Plan hash value: 3167530345:
SQL> var v1 number;
SQL> exec :v1 := 2;
SQL> select count(*) from t_skew where object_id = :v1;

  COUNT(*)
----------
         1
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7mz2mhz0nq92n, child number 0
-------------------------------------
select count(*) from t_skew where object_id = :v1

Plan hash value: 3167530345

------------------------------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      1 |        |      1 |00:00:00.01 |       2 |
|   1 |  SORT AGGREGATE   |            |      1 |      1 |      1 |00:00:00.01 |       2 |
|*  2 |   INDEX RANGE SCAN| IDX_T_SKEW |      1 |     16 |      1 |00:00:00.01 |       2 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=:V1)

--绑定变量值为3,第一次执行,沿用INDEX RANGE SCAN的执行计划,Plan hash value: 3167530345:
SQL> exec :v1 := 3;
SQL> select count(*) from t_skew where object_id = :v1;

  COUNT(*)
----------
     86412
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7mz2mhz0nq92n, child number 0
-------------------------------------
select count(*) from t_skew where object_id = :v1

Plan hash value: 3167530345

------------------------------------------------------------------------------------------
| Id  | Operation         | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |            |      2 |        |      2 |00:00:00.10 |     301 |
|   1 |  SORT AGGREGATE   |            |      2 |      1 |      2 |00:00:00.10 |     301 |
|*  2 |   INDEX RANGE SCAN| IDX_T_SKEW |      2 |     16 |  86413 |00:00:00.06 |     301 |
------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=:V1)

--绑定变量值为3,第二次执行,变为INDEX FAST FULL SCAN的执行计划,Plan hash value: 2333720604:
SQL> select count(*) from t_skew where object_id = :v1;

  COUNT(*)
----------
     86412
SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7mz2mhz0nq92n, child number 1
-------------------------------------
select count(*) from t_skew where object_id = :v1

Plan hash value: 2333720604

----------------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |      1 |        |      1 |00:00:00.07 |     502 |
|   1 |  SORT AGGREGATE       |            |      1 |      1 |      1 |00:00:00.07 |     502 |
|*  2 |   INDEX FAST FULL SCAN| IDX_T_SKEW |      1 |  86389 |  86412 |00:00:00.04 |     502 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_ID"=:V1)

SQL> 

    可以看到,当第二次执行绑定变量值为3的SQL时,执行计划自适应调整了。

3.3 场景测试深入分析

You can use the V$ views for adaptive cursor sharing to see selectivity ranges, cursor information (such as whether a cursor is bind-aware or bind-sensitive), and execution statistics:
V$SQL shows whether a cursor is bind-sensitive or bind-aware
V$SQL_CS_HISTOGRAM shows the distribution of the execution count across a three-bucket execution history histogram
V$SQL_CS_SELECTIVITY shows the selectivity ranges stored for every predicate containing a bind variable if the selectivity was used to check cursor sharing
V$SQL_CS_STATISTICS summarizes the information that the optimizer uses to determine whether to mark a cursor bind-aware.

    通过v$sql查看SQL(SQL_ID = '7mz2mhz0nq92n')的child_number, executions, buffer_gets, bind-sensitive, bind-aware, is_shareable信息:

SQL> SELECT CHILD_NUMBER, EXECUTIONS, BUFFER_GETS, IS_BIND_SENSITIVE AS "BS",
  2         IS_BIND_AWARE AS "BA", IS_SHAREABLE AS "SH", PLAN_HASH_VALUE
  3  FROM   V$SQL
  4  WHERE  SQL_ID = '7mz2mhz0nq92n';

CHILD_NUMBER EXECUTIONS BUFFER_GETS BS BA SH PLAN_HASH_VALUE
------------ ---------- ----------- -- -- -- ---------------
           0          2         348 Y  N  N       3167530345
           1          1         502 Y  Y  Y       2333720604

--再次分别执行绑定变量值为3和2的SQL:
SQL> select count(*) from t_skew where object_id = :v1;

  COUNT(*)
----------
     86412
SQL> exec :v1 := 2;
SQL> select count(*) from t_skew where object_id = :v1;

  COUNT(*)
----------
         1

--再次查询v$sql
CHILD_NUMBER EXECUTIONS BUFFER_GETS BS BA SH PLAN_HASH_VALUE
------------ ---------- ----------- -- -- -- ---------------
           0          2         348 Y  N  N       3167530345
           1          2        1004 Y  Y  Y       2333720604
           2          1           2 Y  Y  Y       3167530345

    可以看到目前该SQL的parent cursor下挂了3个child_number(0和1和2,其中1和2的SH值为Y,意思为可共享;0的SH值为N,意思为不可共享)。

    通过v$sql_cs_*查询acs的相关信息:

--V$SQL_CS_HISTOGRAM
SQL> select * from V$SQL_CS_HISTOGRAM where sql_id = '7mz2mhz0nq92n';

ADDRESS          HASH_VALUE SQL_ID                     CHILD_NUMBER  BUCKET_ID      COUNT
---------------- ---------- -------------------------- ------------ ---------- ----------
0000000087F34700 3242927188 7mz2mhz0nq92n                         2          0          1
0000000087F34700 3242927188 7mz2mhz0nq92n                         2          1          0
0000000087F34700 3242927188 7mz2mhz0nq92n                         2          2          0
0000000087F34700 3242927188 7mz2mhz0nq92n                         1          0          0
0000000087F34700 3242927188 7mz2mhz0nq92n                         1          1          2
0000000087F34700 3242927188 7mz2mhz0nq92n                         1          2          0
0000000087F34700 3242927188 7mz2mhz0nq92n                         0          0          1
0000000087F34700 3242927188 7mz2mhz0nq92n                         0          1          1
0000000087F34700 3242927188 7mz2mhz0nq92n                         0          2          0

--V$SQL_CS_SELECTIVITY
SQL> col PREDICATE for a30
SQL> select * from V$SQL_CS_SELECTIVITY where sql_id = '7mz2mhz0nq92n';

ADDRESS          HASH_VALUE SQL_ID                     CHILD_NUMBER PREDICATE                        RANGE_ID LOW                  HIGH
---------------- ---------- -------------------------- ------------ ------------------------------ ---------- -------------------- --------------------
0000000087F34700 3242927188 7mz2mhz0nq92n                         2 =V1                                     0 0.000167             0.000204
0000000087F34700 3242927188 7mz2mhz0nq92n                         1 =V1                                     0 0.899749             1.099694
SQL> 

--V$SQL_CS_STATISTICS 
SQL> select * from V$SQL_CS_STATISTICS where sql_id = '7mz2mhz0nq92n';

ADDRESS          HASH_VALUE SQL_ID                     CHILD_NUMBER BIND_SET_HASH_VALUE PE EXECUTIONS ROWS_PROCESSED BUFFER_GETS   CPU_TIME
---------------- ---------- -------------------------- ------------ ------------------- -- ---------- -------------- ----------- ----------
0000000087F34700 3242927188 7mz2mhz0nq92n                         2          2064090006 Y           1              4           2          0
0000000087F34700 3242927188 7mz2mhz0nq92n                         1          2706503459 Y           1         172826         502          0
0000000087F34700 3242927188 7mz2mhz0nq92n                         0          2064090006 Y           1              4          49          0
SQL>

4.总结

    实验相关知识点的总结:
4.1 清理某条SQL的执行计划

--查询SQL的ADDRESS和HASH_VALUE
SQL> select sql_id, ADDRESS, HASH_VALUE from v$sqlarea where sql_id = '7mz2mhz0nq92n';

SQL_ID                     ADDRESS          HASH_VALUE
-------------------------- ---------------- ----------
7mz2mhz0nq92n              0000000087F34700 3242927188

--清理SQL的执行计划
SQL> exec sys.DBMS_SHARED_POOL.PURGE('0000000087F34700,3242927188','C');

4.2 bind peeking和acs特性的关闭

--均为动态参数
--bind peeking(绑定变量窥探)
alter system set "_optim_peek_user_binds"=false;

--acs(adaptive cursor sharing)
alter system set "_optimizer_extended_cursor_sharing_rel"=NONE;
alter system set "_optimizer_extended_cursor_sharing"=NONE;
alter system set "_optimizer_adaptive_cursor_sharing"=false;

    特别注意:如果bind peeking是关闭的,实际上acs也就不会起作用,比如我这里只将_optim_peek_user_binds参数设置为false,再次按照3.2步骤重复同样实验,查询结果如下,不会用到acs特性,即使我没有显示禁用掉acs对应的参数:

SQL> SELECT CHILD_NUMBER, EXECUTIONS, BUFFER_GETS, IS_BIND_SENSITIVE AS "BS",
  2         IS_BIND_AWARE AS "BA", IS_SHAREABLE AS "SH", PLAN_HASH_VALUE
  3  FROM   V$SQL
  4  WHERE  SQL_ID = '7mz2mhz0nq92n';

CHILD_NUMBER EXECUTIONS BUFFER_GETS BS BA SH PLAN_HASH_VALUE
------------ ---------- ----------- -- -- -- ---------------
           0          3        1506 N  N  Y       2333720604

--可以看到这3次执行执行计划都是一样的,因为受到OPT_PARAM('_optim_peek_user_binds' 'false')影响,采用了INDEX FAST FULL SCAN的执行计划,Plan hash value: 2333720604:

SQL> select * from table(dbms_xplan.display_cursor('7mz2mhz0nq92n',0,'advanced'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  7mz2mhz0nq92n, child number 0
-------------------------------------
select count(*) from t_skew where object_id = :v1

Plan hash value: 2333720604

------------------------------------------------------------------------------------
| Id  | Operation             | Name       | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |            |       |       |    82 (100)|          |
|   1 |  SORT AGGREGATE       |            |     1 |     3 |            |          |
|*  2 |   INDEX FAST FULL SCAN| IDX_T_SKEW | 43207 |   126K|    82   (0)| 00:00:01 |
------------------------------------------------------------------------------------

Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------

   1 - SEL$1
   2 - SEL$1 / T_SKEW@SEL$1

Outline Data
-------------

  /*+
      BEGIN_OUTLINE_DATA
      IGNORE_OPTIM_EMBEDDED_HINTS
      OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
      DB_VERSION('11.2.0.4')
      OPT_PARAM('_optim_peek_user_binds' 'false')
      ALL_ROWS
      OUTLINE_LEAF(@"SEL$1")
      INDEX_FFS(@"SEL$1" "T_SKEW"@"SEL$1" ("T_SKEW"."OBJECT_ID"))
      END_OUTLINE_DATA
  */

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OBJECT_ID"=:V1)

Column Projection Information (identified by operation id):
-----------------------------------------------------------

   1 - (#keys=0) COUNT(*)[22]

    所以在确认acs特性是否开启时,同时也要查询bind peek的设置情况。