enq锁排查

1、enq: TX - row lock contention

通常,产生enq: TX - row lock contention事件的原因有以下几种可能:
1、不同的session更新或删除同一条记录;
2、唯一索引有重复索引;
3、位图索引同时被更新或同时并发的向位图索引字段上插入相同字段值;
4、并发的对同一个数据块上的数据进行update操作;
5、等待索引块完成分裂;

公司用户反馈一系统在14:00~15:00(2016-08-16)这个时间段反应比较慢,于是生成了这个时间段的AWR报告

 使用awrddrpt.sql生成了15号与16号同一时段的AWR对比报告。

排查过程

1、select event,sid,p1,p2,p3, from gv$session_wait where event='enq: TX - row lock contention';     ---查看会话当中的enq等待时间
select event,count(*) from gv$session group by event; ---会话等待事件统计
2、select ROW_WAIT_OBJ#,ROW_WAIT_FILE#,ROW_WAIT_BLOCK#,ROW_WAIT_ROW# from gv$session where event='enq: TX - row lock contention'; ---查询等待对象
3、select object_name from dba_objects where object_id in (row_wait_obj#); ---查询对象名字
4、select sid,sql_text from gv$session a,gv$sql b where sid in(4,12,41,212,232) and (b.sql_id=a.sql_id or b.sql_id=a.prev_sql_id); ---查询会话sql

---历史记录排查
备注:
1.在高并发的情况下,可能会出现多个可疑SQL
2.UPDATE执行过后,又继续执行新的SQL,就悲催了,这个时候要自己把所有SQL抓出来,按照时间线排序,CHECK

SQL_ID是空的,sql_id为null 的session 有2种,一种是后台进程,还有就是inactive的session。也许有人会说,那我去查询PREV_SQL_ID,恩你去试一试吧,那个SQL_ID是事物的SQL_ID,并不是UPDATE的SQL_ID

可以查找那个时间段sid执行的sql

SELECT SQL_ID,SQL_TEXT,LAST_ACTIVE_TIME,MODULE,ACTION FROM GV$SQL WHERE INST_ID=1 AND LAST_ACTIVE_TIME=TO_DATE('2015-04-10 18:01:44','YYYY-MM-DD HH24:MI:SS'); 

v$active_session_history 这个表只是个取样数据,按秒进行,只有在那一秒采样点处于on cpu或非idle等待的session统计在内。所以可能会不全,有些执行很短的SQL会忽略。这个视图无法还原完整的session历史。

SELECT D.SQL_ID,CHR(BITAND(P1, -16777216) / 16777215) || CHR(BITAND(P1, 16711680) / 65535) "Lock",BITAND(P1, 65535) "Mode",
 COUNT(1),COUNT(DISTINCT d.session_id ) 
 FROM DBA_HIST_ACTIVE_SESS_HISTORY D 
 WHERE D.SAMPLE_TIME BETWEEN TO_DATE('2020-04-09 00:00:00', 'YYYY-MM-DD HH24:MI:SS') AND TO_DATE('2020-04-09 18:00:00', 'YYYY-MM-DD HH24:MI:SS') 
 AND D.EVENT = 'enq: TX - row lock contention' 
 GROUP BY D.SQL_ID,(CHR(BITAND(P1, -16777216) / 16777215) || CHR(BITAND(P1, 16711680) / 65535)),(BITAND(P1, 65535));

--
-查询锁对象 select /* +rule*/ t.INST_ID,decode(request,0,'holder','waiter') role,t.SID,t.TYPE,t.REQUEST,t.LMODE,t.BLOCK,t.CTIME,t.ID1,t.ID2 from gv$lock t
where (t.ID1,t.ID2,t.TYPE) in (select t.ID1,t.ID2,t.TYPE from gv$lock where t.REQUEST>0) order by ctime desc,role;

--wrh$sql_stat 观察过去几小时 SQL 的执行频率、执行计划、单次响应时间、逻辑读等执行统计信息 发现 PLAN_HASH_VALUE 未改变意味着执行计划未曾变化,但是 SQL 单次响应时间发生了数量级的变化,
从百分秒上升到了秒级,逻辑读也有上升,时间在 CPUtime 上花费较少,较多发生到了 Application 类型等待和 Cluster 类型等待上。
-- 另外 ROWS_PROC 即处理行数,逐步呈上升趋势,看起来是应用端发生了堵塞,每次update的行数变大了
select a.snap_id,a.instance_number inst,to_char(b.end_interval_time,'yyyymmdd hh24:mi:ss') end_time,
a.version_count v_cnt,a.plan_hash_value plan_hash,
a.executions_delta exec,
round(a.elapsed_time_delta/decode(a.elapsed_time_delta,0,1,a.elapsed_time_delta)/1000000,6) elap,
round(a.cpu_time_delta/decode(a.executions_delta,0,1,a.elapsed_time_delta)/1000000,6) cput,
round(a.buffer_gets_delta/decode(a.executions_delta,0,1,a.executions_delta)) bget,
round(a.rows_processed_delta/decode(a.executions_delta,0,1,a.executions_delta),3)rows_proc,
round(a.apwait_delta/decode(a.executions_delta,0,1,a.executions_delta)) apwait,
round(a.clwait_delta/decode(a.executions_delta,0,1,a.executions_delta)) clwait,
round(a.ccwait_delta/decode(a.executions_delta,0,1,a.executions_delta)) ccwait
 from sys.wrh$_sqlstat a,sys.wrm$_snapshot b
 where a.dbid=b.dbid
 and a.snap_id=b.snap_id
 and a.instance_number=b.instance_number
 and a.dbid=(select dbid from v$database)
 and a.sql_id='csz0ruth7zbu7'
 order by 4,1,3 desc;

建议

应用层面:
    1、约束通常是为了保证数据完整性,在并发场景下,应充分考虑事务进行的逻辑顺序,避免多个会话事务交叉进行,触发约束冲突在事务级别发生竞争;
    2、要提高并发效率,应当尽可能拆分大事务为小事务,提高 tx enqueue 锁的获取释放速度;
    3、如果要使用悲观锁(for update),应尽可能减少锁定的行范围;
4、
系统要生成唯一并且连续的单号(前缀+数字),为了获取唯一并且连续的单号,所以使用SELECT FOR UPDATE这种设计来实现,没有使用SEQUENCE(因为SEQUENCE可能会跳号,造成单号不连续)
数据库层面:
    1、在 dml 频繁的表上建立适当的索引,提高 SQL 执行的效率,减少 tx enqueue 锁持有的时间;避免全表扫描这种,容易造成 IO 开销巨大,热块竞争,会话堆积的访问方式。
    2、在 dml 频繁的表上不应使用位图索引;
    3、对于 dml 频繁的表,不应使用 IOT 表,物化视图等;
    4、RAC 环境下,对于批量的 dml 操作,尽可能固定在单一节点,尽量降低网络开销、集群竞争、一致性块获取和日志刷盘等带来的影响。
5、SELECT * FROM QRTZ_LOCKS WHERE LOCK_NAME = :1 FOR UPDATE no wait ---无需等待

2、enq: TX - allocate ITL entry

由于缺省情况下创建的表的INITRANS参数为1,索引的INITRANS参数值为2.当有太多的并发DML操作同时操作相同的数据块或索引块就会出现这个等待事件,可以通过查看Segments by ITL Waits部分的信息来了解出现大量并发DML操作的对象

3、enq: TM - contention

一般是执行DML期间,为防止对与DML相关的对象进行修改,执行DML的进程必须对该表获得TM锁,就可能产生enq: TM - contention等待事件,若在获得TM锁的过程中发生争用,则等待enq: HW - contention事件

1.在OPS(早期的RAC)中LGWR会以ID1=0 & ID2=0去申请该队列锁来检查 DML_LOCKS 在所有实例中是全0还是全非0。
2. 当一个单表或分区需要做不同的表/分区操作时,ORACLE需要协调这些操作,所以需要申请该队列锁。
3. 启用参考约束referential constraints
4. 修改约束从DIASABLE NOVALIDATE 到DISABLE VALIDATE
5. 重建IOT
6. 创建视图或者修改ALTER视图时可能需要申请该队列锁
7. 分析表统计信息或validate structure时
8. 一些PDML并行DML操作
9. 所有可能调用kkdllk()函数的操作

4、常用语句

查询阻塞(时间自行更改)
col event_chain for a70
col sid_chain for a30
set lines 180
col final_blocking FOR 9999999
WITH ASH AS
 (SELECT INST_ID,
         SESSION_ID,
         EVENT,
         BLOCKING_SESSION,
         PROGRAM,
         TO_CHAR(SAMPLE_TIME, 'yyyymmdd hh24:mi:ss') SAMPLE_TIME,
         SAMPLE_ID,
         BLOCKING_INST_ID
    FROM GV$ACTIVE_SESSION_HISTORY
   WHERE SAMPLE_TIME > TO_DATE('20210617 10:40:00', 'yyyymmdd hh24:mi:ss')
     AND SAMPLE_TIME < TO_DATE('20210617 11:40:00', 'yyyymmdd hh24:mi:ss'))
SELECT *
  FROM (SELECT SAMPLE_TIME,
               BLOCKING_SESSION FINAL_BLOCKING,
               SYS_CONNECT_BY_PATH(SESSION_ID, ',') SID_CHAIN,
               SYS_CONNECT_BY_PATH(EVENT, ',') EVENT_CHAIN
          FROM ASH
         START WITH SESSION_ID IS NOT NULL
        CONNECT BY PRIOR BLOCKING_SESSION = SESSION_ID
               AND PRIOR INST_ID = BLOCKING_INST_ID
               AND SAMPLE_ID = PRIOR SAMPLE_ID) A
 WHERE INSTR(SID_CHAIN, FINAL_BLOCKING) = 0
   AND NOT EXISTS (SELECT 1
          FROM ASH B
         WHERE A.FINAL_BLOCKING = B.SESSION_ID
           AND B.BLOCKING_SESSION IS NOT NULL)
 ORDER BY SAMPLE_TIME;---查看用户活动事务锁
col sid for 99999
col serial# for 99999999
col username for a15
col sql_id for a15
col prev_sql_id for a15
col trx_status for a6
col used_ublk for 99999999
col used_urec for 99999999
col start_time for a25
col table_owner for a10
col table_name for a15
col lmode for 99999
col request for 99999
set lines 200 pages 200
select s.sid
        ,s.serial#
        ,s.username
        ,s.sql_id
        ,s.prev_sql_id
        ,t.status AS trx_status
        ,t.USED_UBLK
        ,t.USED_UREC
        ,t.START_TIME
        ,d.owner AS table_owner
        ,d.object_name AS table_name
        ,l.lmode
        ,l.type
        ,l.request
from gv$transaction t,gv$session s,dba_objects d,gv$lock l
where t.ses_addr = s.saddr
and l.sid = s.sid
and l.id1 = d.object_id
and t.status = 'ACTIVE'
and d.owner = 'BCM';


查看SID
select sid, serial#
  from gv$session
   where sid in (select blocking_session
                  from gv$session 
                   where blocking_session is not null);

       SID    SERIAL# USERNAME
---------- ---------- ------------------------------
    47      117 SYS

查看锁表sql
SELECT /*+ ORDERED */ 
 sql_text
  FROM v$sqltext a
 WHERE (a.hash_value, a.address) IN
       (SELECT DECODE(sql_hash_value, 0, prev_hash_value, sql_hash_value),
               DECODE(sql_hash_value, 0, prev_sql_addr, sql_address)
          FROM v$session b
         WHERE b.sid = '13')  /* 此处47 为SID*/
 ORDER BY piece ASC;

alter system kill session '49,5';
终止会话,不释放资源

alter system kill session '49,5,@1' immediate;
终止会话,释放资源


---从操作系统层面解决锁冲突
基于会话的SID查看服务器进程的PID
SELECT spid, osuser, s.program  FROM v$session s, v$process p  WHERE s.paddr = p.addr AND s.sid=&sid;
SPID             OSUSER             PROGRAM
------------------------ ------------------------------ ------------------------------------------------
55440             oracle             sqlplus@oracle (TNS V1-V3)

查询SPID对应的进程
[root@oracle ~]# ps -ef | grep 55440
oracle    55440  55439  0 03:36 ?        00:00:00 oraclefsdb (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
root      56006  55978  0 04:14 pts/5    00:00:00 grep 55440


---通过操作系统pid查找sql
SELECT /*+ ORDERED */ 
 sql_text
  FROM v$sqltext a
 WHERE (a.hash_value, a.address) IN
       (SELECT DECODE(sql_hash_value, 0, prev_hash_value, sql_hash_value),
               DECODE(sql_hash_value, 0, prev_sql_addr, sql_address)
          FROM v$session b
         WHERE b.paddr = (select addr from v$process c where c.spid='&pid'))  
 ORDER BY piece ASC;

---通过操作系统pid查找sid
select sid,serial#,username,machine from v$session b where b.paddr = (select addr from v$process c where c.spid = '&pid');


---查看锁情况
SELECT mm.inst_id, 
       ee.username,
       ee.machine,
       mm.sid, 
       ee.serial#,
       ee.sql_id,
       ee.sql_exec_start,
       ee.prev_sql_id,
       mm.TYPE, 
       mm.id1, 
       mm.id2, 
       mm.block,
       LPAD(TRUNC(mm.ctime / 60 / 60), 3) || ' Hour ' || LPAD(TO_CHAR(TRUNC(mm.ctime / 60) - TRUNC(mm.ctime / 60 / 60) * 60, 'fm09'), 2) || ' Min ' || LPAD(TO_CHAR(mm.ctime - TRUNC(mm.ctime / 60) * 60, 'fm09'), 2) || ' Sec' ctime, 
       CASE WHEN mm.block = 1 
   AND mm.lmode != 0 THEN 'holder' 
            WHEN mm.block = 0 
   AND mm.request != 0 THEN 'waiter' 
            ELSE NULL END role, 
       CASE WHEN ee.blocking_session IS NOT NULL THEN 'waiting for SID ' || ee.blocking_session 
            ELSE NULL END blocking_session, 
       dd.sql_text sql_text, 
       cc.event wait_event 
  FROM gv$lock mm, 
       gv$session ee, 
       gv$sqlarea dd, 
       gv$session_wait cc 
 WHERE mm.sid IN (SELECT nn.sid 
                    FROM (SELECT tt.*, 
                                 COUNT(1) OVER (PARTITION BY tt.TYPE, 
   tt.id1, 
   tt.id2) cnt, MAX(tt.lmode) OVER (PARTITION BY tt.TYPE, 
   tt.id1, 
   tt.id2) lmod_flag, MAX(tt.request) OVER (PARTITION BY tt.TYPE, 
   tt.id1, 
   tt.id2) request_flag 
                            FROM gv$lock tt) nn 
                   WHERE nn.cnt > 1 
                     AND nn.lmod_flag != 0 
                     AND nn.request_flag != 0) 
   AND mm.sid = ee.sid (+) 
   AND ee.sql_id = dd.sql_id (+) 
   AND mm.sid = cc.sid (+) 
   AND ((mm.block = 1 
            AND mm.lmode != 0) 
         OR (mm.block = 0 
            AND mm.request != 0)) 
 ORDER BY mm.TYPE, mm.id1, mm.id2, mm.lmode DESC, 
          mm.ctime DESC;
                   

---排查锁原因
select sql_exec_start,session_id,session_serial#,session_type,sql_id,top_level_sql_id,sql_opname,sql_PLAN_OPERATION,
event,p1text,p1,p2text,p2,p3text,p3,wait_time,session_state,blocking_session_status,blocking_session,
blocking_inst_id,top_level_call_name,XID,machine,port FROM gv$active_session_history where session_id='506' 

select inst_id,sql_text from gv$sql where sql_id='dksm4sq280a4b';

select sql_text from dba_hist_sqltext where sql_id IN('bkygnvj8ackj2');

select sql_text from v$sqlarea a,gv$session b where a.sql_id=b.prev_sql_id and b.sid='3627';

SELECT prev_exec_start,username,MODULE,action FROM gv$session WHERE inst_id=1 AND SID='3627'

SELECT a.inst_id,a.sid,b.sql_id,b.sql_text,b.last_active_time,b.MODULE,b.action FROM gv$sql b,gv$session a  WHERE a.inst_id=b.inst_id 
 AND a.SID='3627' 
select DISTINCT b.sql_id,c.blocked_sql_id
  from DBA_HIST_ACTIVE_SESS_HISTORY b,
       (select a.sql_id as blocked_sql_id,
       a.blocking_session,
               a.blocking_session_serial#,
               count(a.blocking_session)
          from DBA_HIST_ACTIVE_SESS_HISTORY a
         where event like '%enq: TX - row lock contention%'
           and snap_id between 18835 and 18836
         group by a.blocking_session, a.blocking_session_serial#,a.sql_id
        having count(a.blocking_session) > 100
         order by 3 desc) c
 where b.session_id = c.blocking_session
   and b.session_serial# = c.blocking_session_serial#
   and b.snap_id between 18835 and 18836;      ---由于2xb71ufa5wmrh

SELECT LK.SID,
DECODE(LK.TYPE,
'TX',
'Transaction',
'TM',
'DML',
'UL',
'PL/SQL User Lock',
LK.TYPE) LOCK_TYPE,
DECODE(LK.LMODE,
0,
'None',
1,
'Null',
2,
'Row-S (SS)',
3,
'Row-X (SX)',
4,
'Share',
5,
'S/Row-X (SSX)',
6,
'Exclusive',
TO_CHAR(LK.LMODE)) MODE_HELD,
DECODE(LK.REQUEST,
0,
'None',
1,
'Null',
2,
'Row-S (SS)',
3,
'Row-X (SX)',
4,
'Share',
5,
'S/Row-X (SSX)',
6,
'Exclusive',
TO_CHAR(LK.REQUEST)) MODE_REQUESTED,
OB.OBJECT_TYPE,
OB.OBJECT_NAME,
LK.BLOCK,
SE.LOCKWAIT
FROM V$LOCK LK, DBA_OBJECTS OB, V$SESSION SE
WHERE LK.TYPE IN ('TM', 'UL')
AND LK.SID = SE.SID
AND LK.ID1 = OB.OBJECT_ID(+)
AND SE.SID IN (156,191)
ORDER BY SID;

posted @ 2021-09-06 10:52  harrison辉  阅读(582)  评论(0)    收藏  举报