ORA-01595 ORA-00600触发的bug
环境说明:
两个节点的rac(11.2.0.4.0 PSU 160419)跑在rhel 6.5的PC上面
问题描述:
一个高频更新查询的生产场景,被一个异常挡住了前进的脚步
SMON: following errors trapped and ignored:
ORA-01595: error freeing extent (7294) of rollback segment (945))
ORA-00600: internal error code, arguments: [ktugnb:clschk_kcbnew_14], [0], [1906], [4], [], [], [], [], [], [], [], []
通过trace文件查看当时执行的脚本和异常情况:
Dump file /opt/app/oracle/diag/rdbms/**/incident/incdir_1320754/**_ora_34345_i1320754.trc Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /opt/app/oracle/product/11.2.0/dbhome_1 System name: Linux Node name: *** Release: 2.6.32-431.el6.x86_64 Version: #1 SMP Sun Nov 10 22:19:54 EST 2013 Machine: x86_64 Instance name: *** Redo thread mounted by this instance: 1 Oracle process number: 83 Unix process pid: 34345, image: oracle@*** *** 2018-01-30 06:49:55.387 *** SESSION ID:(287.1) 2018-01-30 06:49:55.387 *** CLIENT ID:() 2018-01-30 06:49:55.387 *** SERVICE NAME:(***) 2018-01-30 06:49:55.387 *** MODULE NAME:(actara@*** (TNS V1-V3)) 2018-01-30 06:49:55.387 *** ACTION NAME:() 2018-01-30 06:49:55.387 Dump continued from file: /opt/app/oracle/diag/rdbms***/trace/***_ora_34345.trc ORA-00600: 内部错误代码, 参数: [ktugnb:clschk_kcbnew_14], [0], [1906], [4], [], [], [], [], [], [], [], [] ========= Dump for incident 1320754 (ORA 600 [ktugnb:clschk_kcbnew_14]) ======== ----- Beginning of Customized Incident Dump(s) ----- BH (0xc7f566958) file#: 3 rdba: 0x00ca2118 (3/663832) class: 7 ba: 0xc7d9e6000 set: 196 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25 dbwrid: 5 obj: -1 objn: 0 tsn: 2 afn: 3 hint: f hash: [0x143f285cb0,0x1bca8c0700] lru: [0x80ee769e8,0x33eed2bc8] ckptq: [NULL] fileq: [NULL] objq: [0x165ed59230,0x3aec4cb50] objaq: [0xbdf0dac00,0x64ec06880] st: SCURRENT md: NULL fpin: 'ktewh90: kteopshrink' tch: 1 le: 0x14efba8868 flags: LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] Dump of buffer cache at level 10 for tsn=2 rdba=13246744 BH (0xc7f566958) file#: 3 rdba: 0x00ca2118 (3/663832) class: 7 ba: 0xc7d9e6000 set: 196 pool: 3 bsz: 8192 bsi: 0 sflg: 1 pwc: 0,25 dbwrid: 5 obj: -1 objn: 0 tsn: 2 afn: 3 hint: f hash: [0x143f285cb0,0x1bca8c0700] lru: [0x80ee769e8,0x33eed2bc8] ckptq: [NULL] fileq: [NULL] objq: [0x165ed59230,0x3aec4cb50] objaq: [0xbdf0dac00,0x64ec06880] st: SCURRENT md: NULL fpin: 'ktewh90: kteopshrink' tch: 1 le: 0x14efba8868 flags: LRBA: [0x0.0.0] LSCN: [0x0.0] HSCN: [0xffff.ffffffff] HSUB: [65535] GLOBAL CACHE ELEMENT DUMP (address: 0x14efba8868): id1: 0xa2118 id2: 0x3 pkey: INVALID block: (3/663832) lock: X rls: 0x0 acq: 0x0 latch: 23 flags: 0x20 fair: 0 recovery: 0 fpin: 'ktewh101: kteopfmtemb' bscn: 0xe5b.c779ffa1 bctx: (nil) write: 0 scan: 0x0 lcp: (nil) lnk: [NULL] lch: [0xc7f566a90,0xc7f566a90] seq: 119 hist: 67 453 121 113 238 180 143:0 85 16 352 32 197 48 121 227 LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT: flg: 0x00000000 state: SCURRENT tsn: 2 tsh: 1 addr: 0xc7f566958 obj: INVALID cls: EXTENT MAP bscn: 0xe5b.c779ffa1 buffer tsn: 2 rdba: 0x00ca2118 (3/663832) scn: 0x0e5b.c779ffa1 seq: 0x01 flg: 0x04 tail: 0xffa12501 frmt: 0x02 chkval: 0xe4a9 type: 0x25=EXTENT MAP BLOCK OF SYSTEM MANAGED UNDO SEGMENT ----- Current SQL Statement for this session (sql_id=gjvp8cwwbx2nn) ----- update dat_real_fee_551_201801 a set a.charge = a.charge+:v1,a.counts = a.counts+:v2,a.calls = a.calls+:v3, a.flux = a.flux+:v4,a.duration = a.duration+:v5,a.billing_duration = a.billing_duration+:v6,a.acct_id = :v7,a.billing_cycle_id = :v8 where a.dat_real_fee_id = :v9 Bind#0 value=0 Bind#1 value=4 Bind#2 value=1 Bind#3 value=4 Bind#4 value=3600 Bind#5 value=4 Bind#6 value=13917998 Bind#7 value=201801 Bind#8 value=8266601758 ----- Call Stack Trace ----- skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- ksfdmp <- dbgexPhaseII <- dbgexExplicitEndInc <- dbgeEndDDEInvocationImpl <- dbgeEndDDEInvocation <- kcbnew <- ktugnb <- ktugur <- ktuchg2 <- ktbchg2 <- kddchg <- kduovw <- kduurp <- kdusru <- kauupd <- updrow <- qerupFetch <- updaul <- updThreePhaseExe <- updexe <- opiexe <- opiodr <- ttcpip <- opitsk <- opiino <- opiodr <- opidrv <- sou2o <- opimai_real <- ssthrdmain <- main <- libc_start_main <- start PROCESS STATE ------------- Process global information: process: 0x1bc1aeb6f8, call: 0x1a6f7fb758, xact: 0x1b49456d78, curses: 0x1bd1efa530, usrses: 0x1bd1efa530 in_exception_handler: no ---------------------------------------- SO: 0x1bc1aeb6f8, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x1bc1aeb6f8, name=process, file=ksu.h LINE:12721, pg=0 (process) Oracle pid:83, ser:1, calls cur/top: 0x1a6f7fb758/0x1a6f7fb758 flags : (0x0) - flags2: (0x800), flags3: (0x10) intr error: 0, call error: 0, sess error: 0, txn error 0 intr queue: empty ksudlp FALSE at location: 0 (post info) last post received: 0 0 51 last post received-location: ksv2.h LINE:1679 ID:ksvpst: workdone last process to post me: 0x1ba1c6bd30 1 2 last post sent: 0 0 42 last post sent-location: ksv2.h LINE:1652 ID:ksvpst: checkpool last process posted by me: 0x1ba1c6bd30 1 2 (latch info) wait_event=0 bits=0x0 Process Group: DEFAULT, pseudo proc: 0x1bb1dbff90 O/S info: user: grid, term: UNKNOWN, ospid: 34345 OSD pid info: Unix process pid: 34345, image: oracle@sshfdb1 SO: 0x1bd1efa530, type: 4, owner: 0x1bc1aeb6f8, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x1bc1aeb6f8, name=session, file=ksu.h LINE:12729, pg=0 (session) sid: 287 ser: 1 trans: 0x1b49456d78, creator: 0x1bc1aeb6f8 flags: (0x100041) USR/- flags_idl: (0x1) BSY/-/-/-/-/- flags2: (0x40009) -/-/INC DID: , short-term DID: txn branch: (nil) edition#: 100 oct: 6, prv: 0, sql: 0x1a1e6fbec0, psql: 0x1a1e6fbec0, user: 85/** ksuxds FALSE at location: 0 service name: *** client details: O/S info: user: ***, term: , ospid: 29820074 machine: *** program: actara@*** (TNS V1-V3) client info: 134.64.206.151 application name: actara@*** (TNS V1-V3), hash value=4183407071 Current Wait Stack: Not in wait; last wait ended 0.791724 sec ago
可以看出触发异常的是一个更新语句
而ORA-00600 [ktugnb:clschk_kcbnew_14] happens on "EXTENT MAP BLOCK OF SYSTEM MANAGED UNDO SEGMENT"
这个问题是由于一个还未公开发布的bug Bug 18000762 - ASM HIT ORA-600 [KTUGNB:CLSCHK_KCBNEW_14] 导致
可以从补丁Patch 18000762来解决这个问题。
因为我的环境目前无停机维护窗口,只能暂时扩容undo表空间和undo retention来缓解,后来一直未再出现。