代码改变世界

ORACLE中死锁的知识点总结

2017-06-01 12:20  潇湘隐者  阅读(20245)  评论(5编辑  收藏  举报

 

死锁的概念

 

    什么是死锁呢? 其实我们生活中也有很多类似死锁的例子。 我先举一个生活中的例子:过年回家,父亲买了一把水弹枪,儿子和侄子争抢着要先玩,谁也不让谁,拆开包装后,一个抢了枪, 一个逮住了子弹和弹夹。两个都争着要先玩,但是都互不相让。结果两个人都玩不了。如果儿子要先玩,就必须让侄子把子弹和弹夹给他,如果侄子要先玩,就必须让儿子把枪给侄子。他们就这样对峙了十几分钟,互不相让。 我出来调停,让儿子把枪先给侄子玩,每个人玩十分钟。然后两个人开开心心一起玩起来。其实这就是一个活生生的死锁(Dead Lock)的例子。

 

   我们再来看看数据库死锁的概念, 所谓死锁,是指两个会话,每个会话都持有另外一个会话想要的资源,因争夺资源而造成的一种互相等待的现象,此时就会出现死锁,若无外力作用,它们都将无法推进下去。此时称系统处于死锁状态或系统产生了死锁,这些永远在互相等待的进程称为死锁进程。Oracle对于死锁采取的策略是回滚其中一个事务,让另外一个事务顺利进行。

 

   英文关于deadlock的概念如下:

 

A deadlock occurs when a session (A) wants a resource held by another session (B) , but that session also wants a resource held by the first session (A). There can be more than 2 sessions involved but the idea is the same.

 

   

死锁的模拟

 

 

   上面了解了死锁的概念,接下来,我们先人工构造一个简单的死锁(Dead Lock)案例来加深理解一下死锁(Dead Lock),如下所示,我们先准备测试案例使用的表和数据,测试环境为Oracle Database 10g Release 10.2.0.5.0

 

SQL> create table dead_lock_test( id number(10), name varchar2(32));
 
Table created.
 
SQL> insert into dead_lock_test values(101, 'kerry');
 
1 row created.
 
SQL> insert into dead_lock_test values(102, 'ken');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> 

 

 

在会话1SID788)中执行下面SQL语句:

 

 

SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1; 
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       788          0          1
 
SQL> update dead_lock_test set name='kerry1_101' where id=101;
 
1 row updated.
 
SQL> 

 

clip_image001

 

 

 

然后在会话2SID770)中执行下面SQL语句:

 

SQL> show user;
USER 为 "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       770          0          1
 
SQL> update dead_lock_test set name='kerry2_102' where id=102;
 
已更新 1 行。
 
SQL> update dead_lock_test set name='kerry2_101' where id=101;

 

 

 

clip_image002

 

如上所示,会话2SID770)更新id=101这条记录时,会话被阻塞了。然后我们在会话1SID788)中执行下面SQL语句:

 

SQL>  update dead_lock_test set name='kerry1_102' where id=102;

 

clip_image003

 

此时你会立马看到会话2SID770)出现ORA-00060错误,如下所示:

 

clip_image004

 

 

如果对上面的操作过程的流程有点不直观,那么可以参下面表格:

 

 

image

     

 

当然,如果你以下面这样的顺序更新,那么会话1就会出现ORA-0060的错误,会话1会被当做牺牲的会话进行回滚。

 

 

image

 

                                

 

 

此时在告警日志中就会出现trc文件。注意RAC环境和单机环境稍有不同。在RAC环境中,是由LMDLock Manager Daemon)进程统一管理各个节点之间的锁资源的,所以,RAC环境中trace文件是由LMD进程来生成的。

 

Tue Mar 28 15:36:30 CST 2017

ORA-00060: Deadlock detected. More info in file /u01/app/oracle/admin/SCM2/bdump/scm2_s000_15815.trc

 

trace文件的部分内容如下所示:

 

*** 2017-03-28 15:36:30.917
*** ACTION NAME:() 2017-03-28 15:36:30.917
*** MODULE NAME:(SQL*Plus) 2017-03-28 15:36:30.917
*** SERVICE NAME:(SCM2) 2017-03-28 15:36:30.917
*** SESSION ID:(770.8) 2017-03-28 15:36:30.917
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
                       ---------Blocker(s)--------  ---------Waiter(s)---------
Resource Name          process session holds waits  process session holds waits
TX-0006002e-001e409f        15     770     X             16     788           X
TX-0007002c-001f6346        16     788     X             15     770           X
session 770: DID 0001-0010-00000002     session 788: DID 0001-000F-00000001
session 788: DID 0001-000F-00000001     session 770: DID 0001-0010-00000002
Rows waited on:
Session 788: obj - rowid = 00094900 - AACUkAABEAACLUeAAB
  (dictionary objn - 608512, file - 68, block - 570654, slot - 1)
Session 770: obj - rowid = 00094900 - AACUkAABEAACLUeAAA
  (dictionary objn - 608512, file - 68, block - 570654, slot - 0)
Information on the OTHER waiting sessions:
Session 788:
  sid: 788 ser: 9 audsid: 201878652 user: 132/TEST
    flags: (0xe1) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
    flags2: (0x8)
  pid: 16 O/S info: user: oracle, term: UNKNOWN, ospid: 15817
    image: oracle@getlnx14uat.xxxx.com (S001)
  O/S info: user: oracle, term: pts/2, ospid: 23047, machine: DB-Server.localdomain
            program: sqlplus@DB-Server.localdomain (TNS V1-V3)
  application name: SQL*Plus, hash value=3669949024
  Current SQL Statement:
  update dead_lock_test set name='kerry1_102' where id=102
End of information on OTHER waiting sessions.
Current SQL statement for this session:
update dead_lock_test set name='kerry2_101' where id=101
===================================================

 

 

死锁的检测

 

关于死锁的检测,对于单实例来说,基本上秒级完成,对于RAC环境,Oracle 10g基本上是1分钟, Oracle 11g10秒,这个是通过隐含参数_lm_dd_interval控制的。这个参数可以修改,但是不建议修改。

 

COL NAME FOR A32;
COL KSPPDESC FOR A32;
COL KSPPSTVL FOR A32;
SELECT A.INDX,
        A.KSPPINM NAME,
        A.KSPPDESC,
       B.KSPPSTVL
FROM   X$KSPPI  A,
        X$KSPPCV B
WHERE  A.INDX = B.INDX
     AND LOWER(A.KSPPINM) LIKE  LOWER('%&PARAMETER%');

 

clip_image005

 

clip_image006

 

 

 

 

死锁的分析(DeadLock Troubleshooting)

 

 

以上面的例子来说,数据库一旦出现死锁,立马会在告警日志里面生成这样一条记录ORA-00060: Deadlock detected. More info in file xxxxx,那么从trc文件能分析出什么信息呢? 下面我们以上面的例子来简单分析一下

 

clip_image007

 

其实trc文件里面最重要、最有用的信息是Deadlock graph。从这部分,我们可以分析得到下面一些有用信息:

 

 

1: 产生死锁的两个会话信息

 

 

Deadlock graph:

                       ---------Blocker(s)--------  ---------Waiter(s)---------

Resource Name          process session holds waits  process session holds waits

TX-0006002e-001e409f        15     770     X             16     788           X

TX-0007002c-001f6346        16     788     X             15     770           X

session 770: DID 0001-0010-00000002     session 788: DID 0001-000F-00000001

session 788: DID 0001-000F-00000001     session 770: DID 0001-0010-00000002

 

从上面可以看到Blockers)Waiters)的相关信息

 

Resource Name   被持有或等待的锁资源名字

          锁资源名字由三部分组成 Type-ID1-ID2ID1ID2代表的意思由锁类型决定。

         具体可以参考v$lock_type

process   :  V$PROCESS.PID

session   :  V$SESSION.SID

holds    :  锁持有的模式(Mode the lock is held in

waits    :  锁等待的模式(Mode the lock is requested in (waiting for)

 

解读以上死锁的案例:

 

SID 770 (Process 15) 以排它模式持有锁:TX-0006002e-001e409f ,以排它模式请求锁:TX-0007002c-001f6346

SID 788 (Process 16) 以排它模式持有锁:TX-0007002c-001f6346 ,以排它模式请求锁:TX-0006002e-001e409f

 

这一段可以看到,778 阻塞了770, 然后770又阻塞了778 刚好构成了死锁的条件。这里要看生成的记录是两行还是一行,是TX还是TM,如果只有一行那么说明是同一个SESSION,可能是自治事务引起的死锁。

 

 

2:死锁发生在那个对象?

 

Rows waited on:

Session 788: obj - rowid = 00094900 - AACUkAABEAACLUeAAB

  (dictionary objn - 608512, file - 68, block - 570654, slot - 1)

Session 770: obj - rowid = 00094900 - AACUkAABEAACLUeAAA

  (dictionary objn - 608512, file - 68, block - 570654, slot - 0)

 

clip_image008

 

 

3:会话的的机器、应用程序等信息

 

Session 788:

  sid: 788 ser: 9 audsid: 201878652 user: 132/TEST

    flags: (0xe1) USR/- flags_idl: (0x1) BSY/-/-/-/-/-

    flags2: (0x8)

  pid: 16 O/S info: user: oracle, term: UNKNOWN, ospid: 15817

    image: oracle@xxxxxx.xxxx.com (S001)

  O/S info: user: oracle, term: pts/2, ospid: 23047, machine: DB-Server.localdomain

            program: sqlplus@DB-Server.localdomain (TNS V1-V3)

  application name: SQL*Plus, hash value=3669949024

  Current SQL Statement:

  update dead_lock_test set name='kerry1_102' where id=102

End of information on OTHER waiting sessions.

Current SQL statement for this session:

update dead_lock_test set name='kerry2_101' where id=101

 

从上面我们可以看到会话788是从机器DB-Server.localdomain上的SQL*Plus应用程序发出的SQL,如果是正式环境,你会看到相关的机器和应用程序名称。这个会话最后执行的SQL语句为update dead_lock_test set name='kerry1_102' where id=102

另外一个会话执行的最后语句为update dead_lock_test set name='kerry2_101' where id=101,但是如何找到对应的机器、应用程序信息呢?

 

如下截图所示,我们在PROCESS STATE部分,找到对应的SID 770的事务,可以看到usertermmachineprogram信息。剩下的事情,就是你和开发人员分析脚本,缕清细节,然后如何避免死锁的问题。

 

clip_image009

 

 

 

 

死锁的分类

 

死锁如何分类呢?在Metalink上这篇文章中"How to Identify ORA-00060 Deadlock Types Using Deadlock Graphs in Trace (文档 ID 1507093.1)"有关于死锁的分类:如下所示:

 

 

"Key Signature"

Lock Type

Requested
Lock Mode

Deadlock Graph

Likely
Deadlock Type

Comments

Type TX Lock Requesting Mode X (6)

TX

X(6)

TX X X
TX
 
X X

Application

TX Lock Held in Mode X (6) Requesting Mode X (6)

Type TM Lock Requesting Mode SSX (5)

TM

SSX (5)

TM SX SSX SX SSX
TM
 SX SSX 
SX SSX

Missing Index on Foreign Key (FK) Constraint

TM  Lock Held in Mode SX (3) Held SSX (5) Requested

Type TX Lock Requesting Mode S(4)

TX

S(4)

TX X S
TX
 
X S

Insufficient Interested Transaction List (ITL) Provision
OR
Bitmap Index
OR
PK/UK Index

TX Lock Held in Mode X (6) Requesting Mode S (4)

ITL, Bitmap Index and PK/UK Index Signatures are the Same. Further Investigation will be required to identify absolute cause

Type TX Lock Requesting Mode X (6)
Single Row in Deadlock Graph

TX

X(6)

TX X X
Single Row in Deadlock Graph

Self Deadlock
OR
Autonomous Transaction Self Deadlock

This looks the same as a standard application deadlock except that there is only a single row in the deadlock graph.

Type UL Lock in Deadlock Graph

UL

ANY

UL ? ?
?

Application Deadlock Featuring User Defined Locks

This is very similar to the standard application deadlock except that it features User Defined Locks

 

 

 

 

李华荣这篇博客Oracle死锁(DeadLock)的分类及其模拟里面对死锁进行了一个分类,个人觉得是一个通俗、很赞的一个死锁分类。本文很多地方也是参考、借鉴他博客的内容。

 

 

clip_image011

 

 

 

那么我们接下来看看这些死锁产生的场景,并进行一些分析,很多知识点都是参考Metalink上的一些知识点。

 

 

 

 

1:应用程序死锁(Application Deadlock

 

 

其实最上面那个死锁的例子,就属于Application Deadlock,这个Application Deadlock是发生在同一个表,下面我们介绍一下Application Deadlock发生在两个表之间不同顺序相互更新操作引起的死锁。下面开始我们的实验。创建两个测试表,并初始化数据。

 

 

SQL> select * from v$version;       
 
BANNER
----------------------------------------------------------------
Oracle Database 10g Release 10.2.0.5.0 - 64bit Production
PL/SQL Release 10.2.0.5.0 - Production
CORE    10.2.0.5.0      Production
TNS for Linux: Version 10.2.0.5.0 - Production
NLSRTL Version 10.2.0.5.0 - Production
 
SQL> create table lock_test_one(name varchar(32));
 
Table created.
 
SQL> create table lock_test_two(name varchar(32));
 
Table created.
 
SQL> insert into lock_test_one values('aaaaaaaa');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> insert into lock_test_two values('bbbbbbbb');
 
1 row created.
 
SQL> commit;
 
Commit complete.

 

在会话1SID=685)里更新lock_test_one的记录,但是不提交更新(下面是一个模拟死锁出现的过程)

 

 

SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       685          0          1
 
SQL> update lock_test_one set name='cccccccc' where name='aaaaaaaa';
 
1 row updated.

 

 

在会话2(SID=719)里面更新lock_test_two的记录,也不提交更新:

 

 

SQL> show user;
USER 为 "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       719          0          1
 
SQL> update lock_test_two set name='dddddddd' where name='bbbbbbbb';
 
已更新 1 行。

 

 

接下来在会话1SID=685)里面更新lock_test_two,你会发现会话1被回话2阻塞了(开启另外一个会话查询,就会发现阻塞信息,如下截图所示)

 

 

SQL> update lock_test_two set name='eeeeeeee' where name='bbbbbbbb';

 

 

clip_image012

 

 

接下来在会话2里面更新lock_test_one, 模拟死锁就会出现了

 

 

SQL> update lock_test_one set name='ffffffff' where name='aaaaaaaa';

 

 

 

最后,在会话2(SID=719)里面执行上面语句后,你会发现会话1立即就会出现ORA-00060的死锁错误信息。

 

 

clip_image013

 

 

 

 

那么接下来我们来分析一下死锁日志,如下所示,死锁有两行记录,发生在两个会话之间,可以判断为Application Deadlock. objn的值来看,

 

有两个值,意味着两个表之间不同顺序相互更新操作引起了死锁,典型的应用程序死锁有下面特征:

 

    1Deadlock graph中的记录多于1行。

    2Deadlock graph中至少有一行锁类型是TX-X-X

 

clip_image014

 

SQL> col object_name for a32;
SQL> col object_type for a32;
SQL> select object_name, object_type from dba_objects
  2  where object_id=608789;
 
OBJECT_NAME                      OBJECT_TYPE
-------------------------------- --------------------------------
LOCK_TEST_ONE                    TABLE
 
SQL> select object_name, object_type from dba_objects
  2  where object_id=608790;
 
OBJECT_NAME                      OBJECT_TYPE
-------------------------------- --------------------------------
LOCK_TEST_TWO                    TABLE
 
SQL> 

 

还有其它一些有用的信息,例如你想知道死锁发生在哪一行,那么可以通过rowid等位到死锁发生在哪一行。

 

 

 

Rows waited on:

Session 719: obj - rowid = 00094A15 - AACUoVABEAACLUnAAB

  (dictionary objn - 608789, file - 68, block - 570663, slot - 1)

Session 685: obj - rowid = 00094A16 - AACUoWABEAACLUvAAA

  (dictionary objn - 608790, file - 68, block - 570671, slot - 0)

 

 

 

SQL> select * from lock_test_one where rowid='AACUoVABEAACLUnAAB';
 
NAME
--------------------------------
cccccccc
 
SQL> select * from lock_test_two where rowid='AACUoWABEAACLUvAAA';
 
NAME
--------------------------------
bbbbbbbb

 

 

  clip_image015

 

 

 

2:缺少外键索引引起的死锁。

 

 

创建表dead_lock_parentdead_lock_foreign,两者存在主外键关系,分布插入两条测试数据: 

 

 

SQL> create table dead_lock_parent( id number primary key, name varchar2(32));
 
Table created.
 
SQL> create table dead_lock_foreign(fid  number, fname varchar2(32), foreign key(fid) references dead_lock_parent);
 
Table created.
 
SQL> insert into dead_lock_parent values( 1, 'kerry');
 
1 row created.
 
SQL> insert into dead_lock_foreign values(1, 'kerry_fk');  
 
1 row created.
 
SQL> insert into dead_lock_parent values(2, 'jimmy');
 
1 row created.
 
SQL> insert into dead_lock_foreign values(2, 'jimmy_fk');
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> 

 

1:在会话1(会话ID789)里面执行下面SQL语句:

 

 

 

SQL> show user;
USER 为 "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       789          0          1
 
SQL> delete from dead_lock_foreign where fid=1;
 
已删除 1 行。

 

 

 

2:在会话2(会话ID766)里面执行下面SQL语句:

 

 

SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       766          0          1
 
SQL> delete from dead_lock_foreign where fid=2;
 
1 row deleted.

 

 

 

3:接着在会话1(会话ID789)里执行删除dead_lock_parentid1的记录:

 

 

SQL> delete from dead_lock_parent where id=1;

 

 

此时你会发现会话被阻塞了,我们可以用下面SQL查询具体的阻塞信息。

 

COL MODE_HELD FOR A14;
COL LOCK_TYPE FOR A8;
COL MODE_REQUESTED FOR A10;
COL OBJECT_TYPE FOR A14;
COL OBJECT_NAME FOR A20;
SELECT LK.SID,
       DECODE(LK.TYPE,
              'TX',
              'Transaction',
              'TM',
              'DML',
              'UL',
              'PL/SQL User Lock',
              LK.TYPE) LOCK_TYPE,
       DECODE(LK.LMODE,
              0,
              'None',
              1,
              'Null',
              2,
              'Row-S (SS)',
              3,
              'Row-X (SX)',
              4,
              'Share',
              5,
              'S/Row-X (SSX)',
              6,
              'Exclusive',
              TO_CHAR(LK.LMODE)) MODE_HELD,
       DECODE(LK.REQUEST,
              0,
              'None',
              1,
              'Null',
              2,
              'Row-S (SS)',
              3,
              'Row-X (SX)',
              4,
              'Share',
              5,
              'S/Row-X (SSX)',
              6,
              'Exclusive',
              TO_CHAR(LK.REQUEST)) MODE_REQUESTED, 
       OB.OBJECT_TYPE,
       OB.OBJECT_NAME,
       LK.BLOCK,
       SE.LOCKWAIT
  FROM V$LOCK LK, DBA_OBJECTS OB, V$SESSION SE
 WHERE LK.TYPE IN ('TM', 'UL')
   AND LK.SID = SE.SID
   AND LK.ID1 = OB.OBJECT_ID(+)
 AND SE.SID IN (766,789)
 ORDER BY SID;

 

 

clip_image016

 

 

 

接着在会话2(会话ID766)里面执行下面SQL,删除主表中id=2的记录

 

 

SQL> delete from dead_lock_parent where id=2;

 

 

你会发现会话1(会话ID789)就会出现Deadlock

 

 

 

clip_image017

 

trace文件里面,你能看到如下一些信息:

 

 

clip_image018

 

特征:

 

1:在Deadlock Graph里面有多行记录。

2:在Deadlock Graph至少一行的锁类型为TM,并且Waiter等待类型为SSX"(Share Row-eXclusive: Mode 5)

 

 

此时需要去查找对应的主外键约束的表,我们可以在PROCESS STATE里面找到对应objectobject_id, 注意此时这个object_id不是十进制数,而是十六进制数。如下所示:

 

 

 

SQL> col object_name for a32;
SQL> select object_id, object_name from dba_objects where object_name =upper('dead_lock_foreign');
 
 OBJECT_ID OBJECT_NAME
---------- --------------------------------
    608975 DEAD_LOCK_FOREIGN
 
SQL> select to_char(608975,'xxxxx') from dual;
 
TO_CHA
------
 94acf

 

 

 

clip_image019

 

 

 

找到对应的外键表,然后添加索引,然后你再重复上面操作,你就会发现死锁不会出现了。

 

 

SQL> create index idx_dead_lock_foreign_n1 on dead_lock_foreign(fid);

 

Index created.

 

 

 

关于外键缺少索引,可以详细参考我这篇博客"ORACLE中关于外键缺少索引的探讨和总结"

 

 

 

3:主键或唯一索引更新引起的死锁

 

 

SQL>  create table dead_lock_primary(id number(10) primary key, name varchar2(32));

 

Table created.

 

 

首先在会话(会话ID930)下执行下面SQL

 

SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       930          0          1
 
SQL> insert into dead_lock_primary values(2, 'jimmy');
 
1 row created.

 

 

然后在会话2(会话ID926)里下面执行下面SQL

 

 

SQL> show user;
USER 为 "TEST"
SQL> select * from v$mystat where rownum =1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       926          0          1
 
SQL> insert into dead_lock_primary values(1, 'kerry');
 
已创建 1 行。

 

 

 

然后在会话1(会话ID930)下插入id=1的记录。

 

 

SQL> insert into dead_lock_primary values(1, 'kkk');

 

 

此时在会话3下执行下面SQL,你会看到会话2被阻塞了, 如下所示:

 

 

 

SQL> @get_locked_objects_rpt.sql
Enter value for 1: 5
old  42:    AND locks_t.blocked_secs > &1
new  42:    AND locks_t.blocked_secs > 5
========= $Revision: 1.4 $ ($Date: 2013/09/16 13:15:22 $) ===========
Locked object : MONI_DDL_LOG
Locked row#   : AACg2tAAsAAAdaaAAA
Blocked for   : 238 seconds
Blocker info. : TEST@xxxx\GET253194(SID=926) [sqlplus.exe/PID=14144:10392]
Blocked info. : TEST@DB-Server.localdomain(SID=930)
[sqlplus@DB-Server.localdomain (TNS V1-V3)/PID=6806]
Blocked SQL   : insert into dead_lock_primary values(1, 'kkk')
Found 1 blocked session(s).
Disconnected from Oracle Database 10g Release 10.2.0.5.0 - 64bit Production

 

接着在会话2(会话ID926)下执行下面SQL,死锁立马出现:

 

 

SQL> insert into dead_lock_primary values(2, 'jimmy');

 

 

 

clip_image020

 

 

 

clip_image021

 

特征:

 

·         More than one row in the deadlock graph

·         At Least 1 Row in the Deadlock graph is "TX X S" 

·         i.e. The Lock type is TX and the holder holds this in "X" (eXclusive: Mode 6) and waits for nothing. The waiter waits for "S" (Share:Mode 4) and waits for nothing

·         At least one of the Involved Objects is a Primary or Unique Key Index

·         Data is being Inserted or changed in an inconsistent order

 

 

4:位图索引更新引起的死锁。

 

 

下面我们来看看位图索引的更新为什么会导致死锁。我们先构造下面测试所需的例子:

 

 

SQLcreate table dead_lock_bitmap_index
  2  (
  3     id number(10),
  4     name varchar(30),
  5     sex number(1)
  6  );
 
Table created.
 
SQL> insert into dead_lock_bitmap_index 
  2  values(1000, 'kerry', 1);
 
1 row created.
 
SQL> insert into dead_lock_bitmap_index
  2  values(1001, 'Merry', 0);
 
1 row created.
 
SQL> insert into dead_lock_bitmap_index
  2  values(1002, 'Richard', 1);
 
1 row created.
 
SQL> insert into dead_lock_bitmap_index
  2  values(1003, 'Ken', 0);
 
1 row created.
 
SQL> commit;
 
Commit complete.
 
SQL> 
SQLcreate bitmap index idx_dead_lock_bitmap_index on dead_lock_bitmap_index(sex);
 
Index created.
 
SQL> 

 

在会话1(会话ID14)中执行下面SQL,此时sex=1的所有行都会被锁定。

 

 

SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
        14          0          0
 
SQL> update dead_lock_bitmap_index set sex=3 where id=1000 and sex=1;
 
1 row updated.

 

 

 

在会话2(会话ID69)中执行下面SQL,此时sex=0的所有行都会被锁定

 

 

SQL> show user;
USER 为 "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
        69          0          0
 
SQL> update dead_lock_bitmap_index set sex=4 where id=1001 and sex=0;
 
已更新 1 行。

 

 

 

在会话1(会话ID14)中执行下面SQL,此时,这个SQL语句将会被阻塞

 

 

SQL> update dead_lock_bitmap_index set sex=4 where  id=1003 and sex=0;

 

 

 

新建会话,执行下面SQL语句查看具体阻塞情况,如下所示:

 

 

 

SQL> COL USERNAME FOR A14;
SQL> COL MACHINE  FOR A26;
SQL> COL OBJECT_NAME FOR A26;
SQL> SELECT S.SID                             SID, 
  2           S.USERNAME                      USERNAME, 
  3           S.MACHINE                       MACHINE, 
  4           L.TYPE                          TYPE, 
  5           O.OBJECT_NAME                   OBJECT_NAME, 
  6           DECODE(L.LMODE, 0, 'None', 
  7                           1, 'Null', 
  8                           2, 'Row Share', 
  9                           3, 'Row Exlusive', 
 10                           4, 'Share', 
 11                           5, 'Sh/Row Exlusive', 
 12                           6, 'Exclusive')   lmode, 
 13      DECODE(L.REQUEST, 0, 'None', 
 14                             1, 'Null', 
 15                             2, 'Row Share', 
 16                             3, 'Row Exlusive', 
 17                             4, 'Share', 
 18                             5, 'Sh/Row Exlusive', 
 19                             6, 'Exclusive') request, 
 20           L.BLOCK                           BLOCK 
 21    FROM   V$LOCK L, 
 22           V$SESSION S, 
 23           DBA_OBJECTS O 
 24    WHERE  L.SID = S.SID 
 25           AND USERNAME != 'SYSTEM' 
 26           AND O.OBJECT_ID(+) = L.ID1 
 27           AND S.SID IN ( 14,69) 
 28    ORDER  BY S.SID; 
 
       SID USERNAME       MACHINE                    TY OBJECT_NAME                LMODE           REQUEST              BLOCK
---------- -------------- -------------------------- -- -------------------------- --------------- --------------- ----------
        14 TEST           myvlnx14uat.localt.com     TM DEAD_LOCK_BITMAP_INDEX     Row Exlusive    None                     0
        14 TEST           myvlnx14uat.localt.com     TX                            None            Share                    0
        14 TEST           myvlnx14uat.localt.com     TX MRAC_OLAP2_AW_DIMENSIONS_V Exclusive       None                     0
        14 TEST           myvlnx14uat.localt.com     AE ORA$BASE                   Share           None                     0
        69 TEST           xxxx\GET253194             TM DEAD_LOCK_BITMAP_INDEX     Row Exlusive    None                     0
        69 TEST           xxxx\GET253194             TX                            Exclusive       None                     1
        69 TEST           xxxx\GET253194             AE ORA$BASE                   Share           None                     0
 
7 rows selected.
 
SQL> 

 

 

 

在会话2(会话ID69)中执行下面SQL,此时,在会话1(会话ID14)中,你就会立马看见死锁出现:

 

SQL> update dead_lock_bitmap_index set sex=3 where id=1002 and sex=1;

 

 

死锁情况如下所示:

 

 

SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
        14          0          0
 
SQL> update dead_lock_bitmap_index set sex=3 where id=1000 and sex=1;
 
1 row updated.
 
SQL> 
SQL> update dead_lock_bitmap_index set sex=4 where  id=1003 and sex=0;
update dead_lock_bitmap_index set sex=4 where  id=1003 and sex=0
       *
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
 
 
SQL> 

 

 

有位图索引存在的表上面,非常容易就引发阻塞与死锁。这个阻塞不是发生在表上面,而是发生在索引上。因为位图索引锁定的范围远远比普通的b-tree索引锁定的范围大。

 

 

特征:

 

    位图索引更新引起的死锁的特征跟主键或唯一索引更新引起的死锁的特征有部分是类似的。

 

 

The typical deadlock graph for an issue caused by Bitmap Indexes locking multiple contending rows is characterised by the following:

More than one row in the deadlock graph

 

    At Least 1 Row in the Deadlock graph is "TX X S" i.e. The Lock type is TX and the holder holds this in "X" (eXclusive: Mode 6) and waits for nothing. The waiter waits for "S" (Share:Mode 4) and waits for nothing

    At Least one of the Involved Objects is a Bitmap Indexes

 

 

COLUMN OBJECT_NAME FORMAT A15
COLUMN OWNER FORMAT A10
SELECT O.OBJECT_ID,
  O.OWNER,
  O.OBJECT_NAME,
  I.INDEX_TYPE
FROM DBA_OBJECTS O,
  DBA_INDEXES I
WHERE I.OWNER(+)   =O.OWNER
AND I.INDEX_NAME(+)=O.OBJECT_NAME
AND O.OBJECT_TYPE  = 'INDEX'
AND O.OBJECT_ID    =  &OBJECT_ID 
/

 

 

clip_image022

 

 

另外关于位图索引的选择是有一些场景。谨慎选择:

 

 

 

B 树索引更适合索引动态表的 OLTP 环境,而位图索引更适合在大型静态表上使用复杂查询的数据仓库环境

 

位图索引被存储为压缩的索引值,其中包含了一个范围内的ROWID,因此ORACLE必须针对一个给定值锁定所有范围内的ROWID,不支持行级别的锁定。

 

位图索引被存储为压缩的索引值,其中包含了一个范围内的ROWID,因此ORACLE必须针对一个给定值锁定所有范围内的ROWID,不支持行级别的锁定。 

  

    换一种描述方法:使用位图索引时,一个键指向多行(成百上千),如果更新一个位图索引键,会同时将其他行对应位图索引字段进行锁定! 

 

  较之B-Tree索引优点: 

  位图以一种压缩格式存放,因此占用的磁盘空间比B-Tree索引要小得多 

 

  较之B-Tree索引缺点: 

  这种锁定的代价很高,会导致一些DML语句出现锁等待,严重影响插入、更新和删除的效率,对于高并发的系统不适用。 

 

  位图索引使用原则: 

  位图索引主要用于决策支持系统或静态数据,不支持索引行级锁定。

 

 

5:自治事务引发的死锁

 

  ORACLE自制事务是指的存储过程和函数可以自己处理内部事务不受外部事务的影响,用PRAGMA AUTONOMOUS_TRANSACTION来声明,要创建一个自治事务,您必须在匿名块的最高层或者存储过程、函数、数据包或触发的定义部分中,使用PL/SQL中的PRAGMA AUTONOMOUS_TRANSACTION语句。在这样的模块或过程中执行的SQL语句都是自治的。

 

我们先准备测试环境,如下所示。

 

 

Connected to:
Oracle Database 10g Release 10.2.0.5.0 - 64bit Production
 
SQL> create table pat_deadlock_test(id number, name varchar(32));
 
Table created.
 
SQL> insert into pat_deadlock_test   
  2  select 1001, 'kerry' from dual union all
  3  select 1002, 'ken'   from dual;
 
2 rows created.
 
SQL> commit;
 
Commit complete.
 
SQL> --创建存储过程,也可以用触发器、匿名块等。
SQL> create or replace procedure prc_test_dead_lock as 
  2  pragma autonomous_transaction;
  3  begin
  4       update pat_deadlock_test set name='richard' where id=1002;
  5       commit;
  6  end;
  7  /
 
Procedure created.
 
SQL> 
 

 

接下来演示一下自治事务发生死锁的情况。如下所示

 

SQL> show user;
USER is "TEST"
SQL> select * from v$mystat where rownum=1;
 
       SID STATISTIC#      VALUE
---------- ---------- ----------
       837          0          1
 
SQL> update pat_deadlock_test set name='kkk' where id=1002;
 
1 row updated.
 
SQL> exec prc_test_dead_lock;
BEGIN prc_test_dead_lock; END;
 
*
ERROR at line 1:
ORA-00060: deadlock detected while waiting for resource
ORA-06512: at "TEST.PRC_TEST_DEAD_LOCK", line 4
ORA-06512: at line 1

 

 

检查告警日志,就能发现告警日志里面提示死锁生成了trace文件 ORA-00060: Deadlock detected. More info in file /u01/app/oracle/admin/SCM2/udump/scm2_ora_30818.trc.

 

 

clip_image023

 

clip_image024

 

 

自治事务引起死锁的特征:

 

  This type of deadlock occurs when a single session manages to deadlock itself and is characterised by the following:

·         Only one row in the deadlock graph

·         The Row in the Deadlock graph is "TX X X" i.e. The Lock type is TX and the holder holds this in "X" (eXclusive: Mode 6) and waits for nothing. The waiter waits in "X" (eXclusive: Mode 6) and holds nothing.

·         The session is NOT involved in an autonomous transaction

 

      You can determine whether the session is involved in an autonomous transaction by searching for an "Autonomous Transaction Frames" section in the trace file:

 

 

 

6ITL死锁

 

 

ITL概念

 

ITLInterested Transaction List)是Oracle数据块内部的一个组成部分,用来记录该块所有发生的事务,有的时候也叫ITL槽位。如果一个事务一直没有提交,那么,这个事务将一直占用一个ITL槽位,ITL里面记录了事务信息、回滚段的入口和事务类型等等。如果这个事务已经提交,那么,ITL槽位中还保存有这个事务提交时候的SCN号。ITL的个数受表的存储参数INITRANS控制,在一个块内部,默认分配了2ITL的个数,如果这个块内还有空闲空间,那么Oracle是可以利用这些空闲空间再分配ITL。如果没有了空闲空间,那么,这个块因为不能分配新的ITL,所以,就可能发生ITL等待。如果在并发量特别大的系统中,那么最好分配足够的ITL个数,或者设置足够的PCTFREE,保证ITL能扩展,但是PCTFREE有可能是被行数据给消耗掉的,例如UPDATE,所以,也有可能导致块内部的空间不够而导致ITL等待,出现了ITL等待就可能导致ITL死锁。

 

本来想在此总结一下ITL死锁,不过个人看了李华荣写的Oracle死锁(DeadLock)的分类及其模拟ITL死锁,感觉他已经把别人想写的都写完了,所以,如果对TIL死锁部分感兴趣,就参考他的博客吧。

 

 

The typical ITL deadlock graph is characterised by the following:

  • More than one row in the deadlock graph
  • At Least 1 Row in the Deadlock graph is "TX X S"?
    i.e. The Lock type is TX and the holder holds this in "X" (eXclusive: Mode 6) and waits for nothing. The waiter waits for "S" (Share:Mode 4) and waits for nothing
  • None of the Involved Objects are Bitmap Indexes

Process state is likely to contain waiting for 'enq: TX - allocate ITL entry' as the current wait or as a previous wait in the recent wait stack:

clip_image026clip_image026[1]clip_image028

 

 

 

避免发生死锁

 

 数据库的死锁产生是有一定条件的。死锁产生的四个必要条件:

 

1Mutual exclusion(互斥):资源不能被共享,只能由一个进程使用。

2Hold and wait(请求并保持):已经得到资源的进程可以再次申请新的资源。

3No pre-emption(不可剥夺):已经分配的资源不能从相应的进程中被强制地剥夺。

4Circular wait(循环等待条件):系统中若干进程组成环路,该环路中每个进程都在等待相邻进程正占用的资源。

 

 

其实死锁的产生都是应用程序设计问题导致,我们一般通过分析trace文件分析死锁产生的原因后,就可以去破坏死锁产生的条件,来防止死锁产生,从而解决死锁问题。例如,上面案例中,调整更新表的顺序,外键增加索引等等。

 

 

 

 

 

参考资料:

 

 

http://www.cnblogs.com/lhrbest/p/6005702.html

http://www.cnblogs.com/lhrbest/articles/5388514.html

https://support.oracle.com/epmos/faces/DocumentDisplay?_afrLoop=253326631915825&id=62365.1&displayIndex=2&_afrWindowMode=0&_adf.ctrl-state=aoa3r7ym5_105