ORA-01595 ORA-00600触发的bug

环境说明:

两个节点的rac(11.2.0.4.0 PSU 160419)跑在rhel 6.5的PC上面

问题描述:

一个高频更新查询的生产场景,被一个异常挡住了前进的脚步

SMON: following errors trapped and ignored: 
ORA-01595: error freeing extent (7294) of rollback segment (945)) 
ORA-00600: internal error code, arguments: [ktugnb:clschk_kcbnew_14], [0], [1906], [4], [], [], [], [], [], [], [], [] 

通过trace文件查看当时执行的脚本和异常情况:

Dump file /opt/app/oracle/diag/rdbms/**/incident/incdir_1320754/**_ora_34345_i1320754.trc 
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production 
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, 
Data Mining and Real Application Testing options 
ORACLE_HOME = /opt/app/oracle/product/11.2.0/dbhome_1 
System name:    Linux 
Node name:    *** 
Release:    2.6.32-431.el6.x86_64 
Version:    #1 SMP Sun Nov 10 22:19:54 EST 2013 
Machine:    x86_64 
Instance name: ***
Redo thread mounted by this instance: 1 
Oracle process number: 83 
Unix process pid: 34345, image: oracle@*** 


*** 2018-01-30 06:49:55.387 
*** SESSION ID:(287.1) 2018-01-30 06:49:55.387 
*** CLIENT ID:() 2018-01-30 06:49:55.387 
*** SERVICE NAME:(***) 2018-01-30 06:49:55.387 
*** MODULE NAME:(actara@*** (TNS V1-V3)) 2018-01-30 06:49:55.387 
*** ACTION NAME:() 2018-01-30 06:49:55.387 

Dump continued from file: /opt/app/oracle/diag/rdbms***/trace/***_ora_34345.trc 
ORA-00600: 内部错误代码, 参数: [ktugnb:clschk_kcbnew_14], [0], [1906], [4], [], [], [], [], [], [], [], [] 

========= Dump for incident 1320754 (ORA 600 [ktugnb:clschk_kcbnew_14]) ======== 
----- Beginning of Customized Incident Dump(s) ----- 
BH (0xc7f566958) file#: 3 rdba: 0x00ca2118 (3/663832) class: 7 ba: 0xc7d9e6000 
set: 196 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25 
dbwrid: 5 obj: -1 objn: 0 tsn: 2 afn: 3 hint: f 
hash: [0x143f285cb0,0x1bca8c0700] lru: [0x80ee769e8,0x33eed2bc8] 
ckptq: [NULL] fileq: [NULL] objq: [0x165ed59230,0x3aec4cb50] objaq: [0xbdf0dac00,0x64ec06880] 
st: SCURRENT md: NULL fpin: 'ktewh90: kteopshrink' tch: 1 le: 0x14efba8868 
flags: 
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] 
Dump of buffer cache at level 10 for tsn=2 rdba=13246744 
BH (0xc7f566958) file#: 3 rdba: 0x00ca2118 (3/663832) class: 7 ba: 0xc7d9e6000 
set: 196 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25 
dbwrid: 5 obj: -1 objn: 0 tsn: 2 afn: 3 hint: f 
hash: [0x143f285cb0,0x1bca8c0700] lru: [0x80ee769e8,0x33eed2bc8] 
ckptq: [NULL] fileq: [NULL] objq: [0x165ed59230,0x3aec4cb50] objaq: [0xbdf0dac00,0x64ec06880] 
st: SCURRENT md: NULL fpin: 'ktewh90: kteopshrink' tch: 1 le: 0x14efba8868 
flags: 
LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] 
GLOBAL CACHE ELEMENT DUMP (address: 0x14efba8868): 
id1: 0xa2118 id2: 0x3 pkey: INVALID block: (3/663832) 
lock: X rls: 0x0 acq: 0x0 latch: 23 
flags: 0x20 fair: 0 recovery: 0 fpin: 'ktewh101: kteopfmtemb' 
bscn: 0xe5b.c779ffa1 bctx: (nil) write: 0 scan: 0x0 
lcp: (nil) lnk: [NULL] lch: [0xc7f566a90,0xc7f566a90] 
seq: 119 hist: 67 453 121 113 238 180 143:0 85 16 352 32 197 48 121 227 
LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT: 
flg: 0x00000000 state: SCURRENT tsn: 2 tsh: 1 
addr: 0xc7f566958 obj: INVALID cls: EXTENT MAP bscn: 0xe5b.c779ffa1 
buffer tsn: 2 rdba: 0x00ca2118 (3/663832) 
scn: 0x0e5b.c779ffa1 seq: 0x01 flg: 0x04 tail: 0xffa12501 
frmt: 0x02 chkval: 0xe4a9 type: 0x25=EXTENT MAP BLOCK OF SYSTEM MANAGED UNDO SEGMENT 

----- Current SQL Statement for this session (sql_id=gjvp8cwwbx2nn) ----- 

update dat_real_fee_551_201801 a set a.charge = a.charge+:v1,a.counts = a.counts+:v2,a.calls = a.calls+:v3, 
a.flux = a.flux+:v4,a.duration = a.duration+:v5,a.billing_duration = a.billing_duration+:v6,a.acct_id = :v7,a.billing_cycle_id = :v8 
where a.dat_real_fee_id = :v9 

Bind#0 value=0 
Bind#1 value=4 
Bind#2 value=1 
Bind#3 value=4 
Bind#4 value=3600 
Bind#5 value=4 
Bind#6 value=13917998 
Bind#7 value=201801 
Bind#8 value=8266601758 

----- Call Stack Trace ----- 

skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp 
<- ksfdmp <- dbgexPhaseII <- dbgexExplicitEndInc <- dbgeEndDDEInvocationImpl <- dbgeEndDDEInvocation 
<- kcbnew <- ktugnb <- ktugur <- ktuchg2 <- ktbchg2 
<- kddchg <- kduovw <- kduurp <- kdusru <- kauupd 
<- updrow <- qerupFetch <- updaul <- updThreePhaseExe <- updexe 
<- opiexe <- opiodr <- ttcpip <- opitsk <- opiino 
<- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain 
<- main <- libc_start_main <- start 

PROCESS STATE 
------------- 
Process global information: 
process: 0x1bc1aeb6f8, call: 0x1a6f7fb758, xact: 0x1b49456d78, curses: 0x1bd1efa530, usrses: 0x1bd1efa530 
in_exception_handler: no 
---------------------------------------- 
SO: 0x1bc1aeb6f8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 
proc=0x1bc1aeb6f8, name=process, file=ksu.h LINE:12721, pg=0 
(process) Oracle pid:83, ser:1, calls cur/top: 0x1a6f7fb758/0x1a6f7fb758 
flags : (0x0) - 
flags2: (0x800), flags3: (0x10) 
intr error: 0, call error: 0, sess error: 0, txn error 0 
intr queue: empty 
ksudlp FALSE at location: 0 
(post info) last post received: 0 0 51 
last post received-location: ksv2.h LINE:1679 ID:ksvpst: workdone 
last process to post me: 0x1ba1c6bd30 1 2 
last post sent: 0 0 42 
last post sent-location: ksv2.h LINE:1652 ID:ksvpst: checkpool 
last process posted by me: 0x1ba1c6bd30 1 2 
(latch info) wait_event=0 bits=0x0 
Process Group: DEFAULT, pseudo proc: 0x1bb1dbff90 
O/S info: user: grid, term: UNKNOWN, ospid: 34345 
OSD pid info: Unix process pid: 34345, image: oracle@sshfdb1 

SO: 0x1bd1efa530, type: 4, owner: 0x1bc1aeb6f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 
proc=0x1bc1aeb6f8, name=session, file=ksu.h LINE:12729, pg=0 
(session) sid: 287 ser: 1 trans: 0x1b49456d78, creator: 0x1bc1aeb6f8 
flags: (0x100041) USR/- flags_idl: (0x1) BSY/-/-/-/-/- 
flags2: (0x40009) -/-/INC 
DID: , short-term DID: 
txn branch: (nil) 
edition#: 100 oct: 6, prv: 0, sql: 0x1a1e6fbec0, psql: 0x1a1e6fbec0, user: 85/**
ksuxds FALSE at location: 0 
service name: ***
client details: 
O/S info: user: ***, term: , ospid: 29820074 
machine: *** program: actara@*** (TNS V1-V3) 
client info: 134.64.206.151 
application name: actara@*** (TNS V1-V3), hash value=4183407071 
Current Wait Stack: 
Not in wait; last wait ended 0.791724 sec ago

可以看出触发异常的是一个更新语句

而ORA-00600 [ktugnb:clschk_kcbnew_14] happens on "EXTENT MAP BLOCK OF SYSTEM MANAGED UNDO SEGMENT"

这个问题是由于一个还未公开发布的bug Bug 18000762 - ASM HIT ORA-600 [KTUGNB:CLSCHK_KCBNEW_14] 导致

可以从补丁Patch 18000762来解决这个问题。

 

因为我的环境目前无停机维护窗口,只能暂时扩容undo表空间和undo retention来缓解,后来一直未再出现。


 
posted @ 2018-02-09 08:41  tonnytangy  阅读(1115)  评论(0编辑  收藏