Oracle Update语句中多表关联中被关联表多次全表扫描问题

 

Oracle Update语句中多表关联中被关联表多次全表扫描问题

前言

最近优化了一个update语句中,多表关联导致表多次全表扫描的性能问题。

尝试用merge into改写后发现原来不知道多久能运行完的语句达到秒级别执行完,因为merge into可以避免多次的全表扫描。

比较好模拟,接下来模拟一下,也好记录一下有些遇到的小问题。

 

事故模拟

这个是我实际生产优化的数据和语句的模拟。

drop table t1 purge;
drop table t2 purge;
create table t1 as select * from dba_objects where rownum<=100;
create table t2 as select * from dba_objects where rownum<=50;
View Code
10:34:30 SYS@zkm(1)> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.01
10:35:27 SYS@zkm(1)> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.01
10:35:27 SYS@zkm(1)> create table t1 as select * from dba_objects where rownum<=100;

Table created.

Elapsed: 00:00:00.01
10:35:27 SYS@zkm(1)> create table t2 as select * from dba_objects where rownum<=50;

Table created.

Elapsed: 00:00:00.01

 

update的语句如下:

update t1
   set (t1.owner, t1.object_name) =
       (select t2.owner, t2.object_name
          from t2
         where t1.object_id = t2.object_id)
 where exists (select 1 from t2 where t1.object_id = t2.object_id);

 

开启statistics_level为all,看看执行后对被关联表t2的执行次数是多少次。

10:43:41 SYS@zkm(1)> set pagesize 9999 long 9999 line 500 timing on
10:45:14 SYS@zkm(1)> alter session set statistics_level=all;

Session altered.

Elapsed: 00:00:00.00
10:45:14 SYS@zkm(1)> update t1
10:45:19   2     set (t1.owner, t1.object_name) =
10:45:19   3         (select t2.owner, t2.object_name
10:45:19   4            from t2
10:45:19   5           where t1.object_id = t2.object_id)
10:45:19   6   where exists (select 1 from t2 where t1.object_id = t2.object_id);

50 rows updated.

Elapsed: 00:00:00.00
10:45:20 SYS@zkm(1)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------
SQL_ID  aq3ptnvdvmnpq, child number 0
-------------------------------------
update t1    set (t1.owner, t1.object_name) =        (select t2.owner,
t2.object_name           from t2          where t1.object_id =
t2.object_id)  where exists (select 1 from t2 where t1.object_id =
t2.object_id)

Plan hash value: 1571583455

----------------------------------------------------------------------------------------------------------------------
| Id  | Operation             | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT      |         |      1 |        |      0 |00:00:00.01 |     209 |       |       |          |
|   1 |  UPDATE               | T1      |      1 |        |      0 |00:00:00.01 |     209 |       |       |          |
|*  2 |   HASH JOIN RIGHT SEMI|         |      1 |    100 |     50 |00:00:00.01 |       7 |  2440K|  2440K| 1515K (0)|
|   3 |    VIEW               | VW_SQ_1 |      1 |     50 |     50 |00:00:00.01 |       3 |       |       |          |
|   4 |     TABLE ACCESS FULL | T2      |      1 |     50 |     50 |00:00:00.01 |       3 |       |       |          |
|   5 |    TABLE ACCESS FULL  | T1      |      1 |    100 |    100 |00:00:00.01 |       4 |       |       |          |
|*  6 |   TABLE ACCESS FULL   | T2      |     50 |      1 |     50 |00:00:00.01 |     150 |       |       |          |
----------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("T1"."OBJECT_ID"="ITEM_1")
   6 - filter("T2"."OBJECT_ID"=:B1)

Note
-----
   - dynamic sampling used for this statement (level=2)


31 rows selected.

Elapsed: 00:00:00.01
10:45:22 SYS@zkm(1)> rollback;

Rollback complete.

Elapsed: 00:00:00.01
10:45:24 SYS@zkm(1)> 

可以看到id=6的starts的值为50,表示id为6的t2全表扫描被执行了50次,在生产事故中,t2表是比较大的,并且扫描次数相对较多,因此update执行时间是非常久的。

这里id=1的子步骤为id=2和id=6,此处的id=1的UPDATE操作类似和“NESTED LOOPS”一样,id=2出有多少条符合条件的记录,则id=6就执行多少次。

比如这里的,t1被update的记录通过“where exists (select 1 from t2 where t1.object_id = t2.object_id)”后,仍有50行符合可以被update,因此这50行中每行都会去全表扫t2一次。

10:56:00 SYS@zkm(1)> select count(*) from t1 where exists (select 1 from t2 where t1.object_id = t2.object_id);

  COUNT(*)
----------
        50

Elapsed: 00:00:00.00

 

 

使用merge into改造后,效率变高(数据量小看不出执行时间的差别,你可以自己构造巨量数据去比较,这里主要看t2全表扫描的次数),

merge into t1
using t2
on (t1.object_id = t2.object_id)
when matched then
  update set t1.owner = t2.owner, t1.object_name = t2.object_name;
View Code
11:00:23 SYS@zkm(1)> merge into t1
11:00:23   2  using t2
11:00:23   3  on (t1.object_id = t2.object_id)
11:00:23   4  when matched then
11:00:23   5    update set t1.owner = t2.owner, t1.object_name = t2.object_name;

50 rows merged.

Elapsed: 00:00:00.00
11:00:23 SYS@zkm(1)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------
SQL_ID  agt8fym6y8hug, child number 0
-------------------------------------
merge into t1 using t2 on (t1.object_id = t2.object_id) when matched
then   update set t1.owner = t2.owner, t1.object_name = t2.object_name

Plan hash value: 2683531971

------------------------------------------------------------------------------------------------------------------
| Id  | Operation            | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------
|   0 | MERGE STATEMENT      |      |      1 |        |      0 |00:00:00.01 |      59 |       |       |          |
|   1 |  MERGE               | T1   |      1 |        |      0 |00:00:00.01 |      59 |       |       |          |
|   2 |   VIEW               |      |      1 |        |     50 |00:00:00.01 |       7 |       |       |          |
|*  3 |    HASH JOIN         |      |      1 |     50 |     50 |00:00:00.01 |       7 |   870K|   870K| 1289K (0)|
|   4 |     TABLE ACCESS FULL| T2   |      1 |     50 |     50 |00:00:00.01 |       3 |       |       |          |
|   5 |     TABLE ACCESS FULL| T1   |      1 |    100 |    100 |00:00:00.01 |       4 |       |       |          |
------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   3 - access("T1"."OBJECT_ID"="T2"."OBJECT_ID")

Note
-----
   - dynamic sampling used for this statement (level=2)


27 rows selected.

Elapsed: 00:00:00.01
11:00:26 SYS@zkm(1)> rollback;

Rollback complete.

Elapsed: 00:00:00.00

 

对比可以知道,merge into的t2只扫描了一次,并且其他维度比如Buffers,OMem,1Mem,Used-Mem都减少了。

 

merge into的限制

重新构造另外的数据环境。

drop table t1 purge;
drop table t2 purge;

create table t1 ( id number,name varchar2(20));
create table t2 ( id number,name varchar2(20));

insert into t1 values (1,'t1a');
insert into t1 values (2,'t1b');
insert into t1 values (3,'t1c');
insert into t1 values (4,'t1d');
insert into t1 values (5,'t1e');
insert into t1 values (6,'t1f');
 

insert into t2 values (2,'t2a');
insert into t2 values (3,'t2b');
insert into t2 values (4,'t2c');
insert into t2 values (5,'t2d');
insert into t2 values (6,'t2e');
insert into t2 values (7,'t2f');
commit;
View Code
11:04:36 SYS@zkm(1)> drop table t1 purge;

Table dropped.

Elapsed: 00:00:00.01
11:04:42 SYS@zkm(1)> drop table t2 purge;

Table dropped.

Elapsed: 00:00:00.01
11:04:43 SYS@zkm(1)> create table t1 ( id number,name varchar2(20));

Table created.

Elapsed: 00:00:00.01
11:04:49 SYS@zkm(1)> create table t2 ( id number,name varchar2(20));

Table created.

Elapsed: 00:00:00.00
11:04:49 SYS@zkm(1)> insert into t1 values (1,'t1a');

1 row created.

Elapsed: 00:00:00.00
11:04:52 SYS@zkm(1)> insert into t1 values (2,'t1b');

1 row created.

Elapsed: 00:00:00.00
11:04:52 SYS@zkm(1)> insert into t1 values (3,'t1c');

1 row created.

Elapsed: 00:00:00.01
11:04:52 SYS@zkm(1)> insert into t1 values (4,'t1d');

1 row created.

Elapsed: 00:00:00.00
11:04:52 SYS@zkm(1)> insert into t1 values (5,'t1e');

1 row created.

Elapsed: 00:00:00.00
11:04:52 SYS@zkm(1)> insert into t1 values (6,'t1f');

1 row created.

Elapsed: 00:00:00.00
11:04:53 SYS@zkm(1)> insert into t2 values (2,'t2a');

1 row created.

Elapsed: 00:00:00.00
11:04:56 SYS@zkm(1)> insert into t2 values (3,'t2b');

1 row created.

Elapsed: 00:00:00.00
11:04:56 SYS@zkm(1)> insert into t2 values (4,'t2c');

1 row created.

Elapsed: 00:00:00.00
11:04:56 SYS@zkm(1)> insert into t2 values (5,'t2d');

1 row created.

Elapsed: 00:00:00.00
11:04:56 SYS@zkm(1)> insert into t2 values (6,'t2e');

1 row created.

Elapsed: 00:00:00.00
11:04:56 SYS@zkm(1)> insert into t2 values (7,'t2f');

1 row created.

Elapsed: 00:00:00.00
11:04:56 SYS@zkm(1)> commit;

Commit complete.

Elapsed: 00:00:00.00
11:04:57 SYS@zkm(1)> select * from t1;

        ID NAME
---------- ------------------------------------------------------------
         1 t1a
         2 t1b
         3 t1c
         4 t1d
         5 t1e
         6 t1f

6 rows selected.

Elapsed: 00:00:00.00
11:05:15 SYS@zkm(1)> select * from t2;

        ID NAME
---------- ------------------------------------------------------------
         2 t2a
         3 t2b
         4 t2c
         5 t2d
         6 t2e
         7 t2f

6 rows selected.

Elapsed: 00:00:00.00

 

对于update语句,

update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id);

 无法改造成merge into,如下:

merge into t1
using t2
on (t1.id = t2.id)
when matched then
  update set t1.id = t2.id, t1.name = t2.name;

 

因为merge into中不能对匹配字段id进行update,会报ORA-38104,

11:12:44 SYS@zkm(1)> merge into t1
11:13:18   2  using t2
11:13:18   3  on (t1.id = t2.id)
11:13:18   4  when matched then
11:13:18   5    update set t1.id = t2.id, t1.name = t2.name;
on (t1.id = t2.id)
    *
ERROR at line 3:
ORA-38104: Columns referenced in the ON Clause cannot be updated: "T1"."ID"


Elapsed: 00:00:00.00

 

这种情况下,我只能想到在t2的id字段创建索引,避免多次的全表扫描。

11:15:33 SYS@zkm(1)> set pagesize 9999 long 9999 line 500 timing on
11:15:34 SYS@zkm(1)> alter session set statistics_level=all;

Session altered.

Elapsed: 00:00:00.00
11:15:34 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id);

5 rows updated.

Elapsed: 00:00:00.00
11:15:35 SYS@zkm(1)> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'));

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID  c124snsjck4d6, child number 1
-------------------------------------
update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where
t1.id=t2.id) where exists (select t2.id,t2.name from t2 where
t1.id=t2.id)

Plan hash value: 2611650519

-------------------------------------------------------------------------------------------------
| Id  | Operation                    | Name   | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
-------------------------------------------------------------------------------------------------
|   0 | UPDATE STATEMENT             |        |      1 |        |      0 |00:00:00.01 |      22 |
|   1 |  UPDATE                      | T1     |      1 |        |      0 |00:00:00.01 |      22 |
|*  2 |   FILTER                     |        |      1 |        |      5 |00:00:00.01 |       9 |
|   3 |    TABLE ACCESS FULL         | T1     |      1 |      6 |      6 |00:00:00.01 |       3 |
|*  4 |    INDEX RANGE SCAN          | IDX_ID |      6 |      1 |      5 |00:00:00.01 |       6 |
|   5 |   TABLE ACCESS BY INDEX ROWID| T2     |      5 |      1 |      5 |00:00:00.01 |      10 |
|*  6 |    INDEX RANGE SCAN          | IDX_ID |      5 |      1 |      5 |00:00:00.01 |       5 |
-------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter( IS NOT NULL)
   4 - access("T2"."ID"=:B1)
   6 - access("T2"."ID"=:B1)

Note
-----
   - dynamic sampling used for this statement (level=2)


31 rows selected.

Elapsed: 00:00:00.01
11:15:36 SYS@zkm(1)> rollback;

Rollback complete.

Elapsed: 00:00:00.00

 

注意点

一开始调试的时候,我以为where条件是可以去掉的,但其实不对。

什么意思呢?比如

update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id);

 我以为是和

update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id);

 等价的,但是其实不等价。

实际执行看看就知道了,

11:19:34 SYS@zkm(1)> drop index idx_id;

Index dropped.

Elapsed: 00:00:00.01
11:22:37 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id);

5 rows updated.

Elapsed: 00:00:00.01
11:22:39 SYS@zkm(1)> rollback;

Rollback complete.

Elapsed: 00:00:00.00
11:22:42 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id);

6 rows updated.

Elapsed: 00:00:00.00
11:22:46 SYS@zkm(1)> rollback;

Rollback complete.

Elapsed: 00:00:00.00

 

 

 

为什么?

where exists实际上是锁定其批量更新数据的范围,update的时候是根据被update的表t1的每一条数据去做更新的。

比如语句update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id),没有任何条件的话那么肯定是全部的行都要被进行update的,在子查询中如果不符合t1.id=t2.id的行,被更新字段是会被值为null的,如下:

11:37:31 SYS@zkm(1)> select * from t1;

        ID NAME
---------- ------------------------------------------------------------
         1 t1a
         2 t1b
         3 t1c
         4 t1d
         5 t1e
         6 t1f

6 rows selected.

Elapsed: 00:00:00.00
11:37:33 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id);

6 rows updated.

Elapsed: 00:00:00.00
11:37:36 SYS@zkm(1)> select * from t1;

        ID NAME
---------- ------------------------------------------------------------

         2 t2a
         3 t2b
         4 t2c
         5 t2d
         6 t2e

6 rows selected.

Elapsed: 00:00:00.00
11:37:39 SYS@zkm(1)> rollback;

Rollback complete.

Elapsed: 00:00:00.00

 

 

因为id=1在t2中匹配不到(可以理解为t2表中为null),但是又必须被更新(前边说了没限定条件是要全部被更新的),所以被设置为null。

可以这么看,当取出t1表中第一行(id=1,name=t1a)后,将该行中t1.id和t2.id进行等价匹配(这次进行一次t2的全表扫描),发现没有符合条件的数据,因此只能都设置为null。

因此,where exists的限定是必须要有的,我们来看看,

11:40:55 SYS@zkm(1)> select * from t1;

        ID NAME
---------- ------------------------------------------------------------
         1 t1a
         2 t1b
         3 t1c
         4 t1d
         5 t1e
         6 t1f

6 rows selected.

Elapsed: 00:00:00.00
11:41:09 SYS@zkm(1)> update t1 set (t1.id,t1.name)=(select t2.id,t2.name from t2 where t1.id=t2.id) where exists (select t2.id,t2.name from t2 where t1.id=t2.id);

5 rows updated.

Elapsed: 00:00:00.00
11:41:10 SYS@zkm(1)> select * from t1;

        ID NAME
---------- ------------------------------------------------------------
         1 t1a
         2 t2a
         3 t2b
         4 t2c
         5 t2d
         6 t2e

6 rows selected.

Elapsed: 00:00:00.00
11:41:14 SYS@zkm(1)> rollback;

Rollback complete.

Elapsed: 00:00:00.00

 

可以这么看,对t1全表扫描后,取出的数据首先判断是否满足where exists的条件。

比如第一行(id=1,name=t1a)不满足where exists (select t2.id,t2.name from t2 where t1.id=t2.id),因此,不对这一行做update。

剩下的5行全部满足条件,所以进行update。

 

posted @ 2020-08-19 11:17  PiscesCanon  阅读(797)  评论(1编辑  收藏  举报