troubleshooting ORA-600[6302] & ORA-600 [6200] corrupted index block

今天同事的一套数据库遇到了alert 日志 一堆的ora-600,这是一套10.2.0.5 db 2nodes RAC on AIX , 找我帮着看看, 最终确认为一个索引出现了block corrupted 勘误块导致, 重新创建的了索引后解决, 下面简单记录一下我的诊断过程 。

# db alert log file #

Sun Feb 01 07:14:46 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:15:07 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:15:07 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [ksu_register_tac-1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:15:09 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [ksu_register_tac-1], [], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:15:38 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:15:46 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [17182], [0x1105B14C8], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:16:09 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [17182], [0x1105B14C8], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:16:40 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [17182], [0x1105B14C8], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 07:16:46 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_16449844.trc:
ORA-07445: exception encountered: core dump [kghrst+020c] [SIGSEGV] [Address not mapped to object] [0xF08C704140910] [] []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [17182], [0x1105B14C8], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kghrcdepth:ds], [0x1105B9D20], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []

===============
Sun Feb 01 12:26:54 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_56230004.trc:
ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0x110583508], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0x110583508], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0x110583508], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0x110583508], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0x110583508], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0x110583508], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [kgh_heap_sizes:ds], [0x110583508], [], [], [], [], [], []
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Sun Feb 01 12:27:53 BEIST 2015
SMON: Parallel transaction recovery tried
Sun Feb 01 12:28:04 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_53215708.trc:
ORA-00600: internal error code, arguments: [6200], [18446744073709551615], [], [], [], [], [], []
Sun Feb 01 12:28:04 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_53215708.trc:
ORA-07445: exception encountered: core dump [leftmost+000c] [SIGSEGV] [Address not mapped to object] [0x105C4172A2B1034] [] []
ORA-00600: internal error code, arguments: [6200], [18446744073709551615], [], [], [], [], [], []
Sun Feb 01 12:28:04 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_53215708.trc:
ORA-07445: exception encountered: core dump [leftmost+000c] [SIGSEGV] [Address not mapped to object] [0x105C4172A2B1034] [] []
ORA-07445: exception encountered: core dump [leftmost+000c] [SIGSEGV] [Address not mapped to object] [0x105C4172A2B1034] [] []
ORA-00600: internal error code, arguments: [6200], [18446744073709551615], [], [], [], [], [], []
Sun Feb 01 12:28:04 BEIST 2015

==============
Sun Feb 01 14:28:37 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_51249546.trc:
ORA-00600: internal error code, arguments: [6300], [], [], [], [], [], [], []
Sun Feb 01 14:28:37 BEIST 2015
Errors in file /oracle/app/oracle/admin/acctc/udump/acctc2_ora_51249546.trc:
ORA-07445: exception encountered: core dump [leftmost+000c] [SIGSEGV] [Address not mapped to object] [0xD2007308C704241D] [] []
ORA-00600: internal error code, arguments: [6300], [], [], [], [], [], [], []
TIP:
错误从早上7点开始,一直到下午都在不断有ORA-600 ORA-7445 的错误信息写入alert,错误信息眼花缭乱,当多个错误出现是通常是从最早产生的错误查起,也许后面是附加的错误, 我们先从第一个ora-600 开始,ora-600 6302 是验证index key 相关的错误。

# ora-600 6302 trace file #

key1 (15104):
ab 08 c7 04 24 15 01 42 59 46 05 c4 11 04 5f 34 00 00 18 4d f0 ef 00 0f 08
c7 04 24 15 01 42 59 46 05 c4 11 3d 45 2e 00 00 5b 47 7d 8b 00 0e 08 c7 04
24 15 01 42 59 46 05 c4 12 32 33 0a 00 00 5c 4b 0b 1c 00 82 08 c7 04 24 15
01 42 59 46 05 c4 12 3f 37 13 00 00 5e 4d 44 e3 00 e4 08 c7 04 24 15 01 42
59 46 05 c4 12 41 4c 2d 00 00 5d 46 cb 5f 00 59 08 c7 04 24 15 01 42 59 46
...
04 24 15 01 42 58 61 05 c4 12 45 25 3c 00 00 19 0d 8b 5c 00 4b 08 c7 04 24
15 01 42 58
key2 (4): 24 15 01 42
*** 2015-02-01 07:14:46.313
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [6302], [1], [], [], [], [], [], []
Current SQL statement for this session:
MERGE INTO TEST t1
USING (SELECT :mincycle as MIN_CYCLE_ID,
:maxcycle as MAX_CYCLE_ID,
:freevol as FREE_VOLUME,
:usagevol as USAGE_VOLUME,
:overvol as OVER_VOLUME,
:tariffplan as TARIFF_PLAN_ID,
to_date(:endtime , 'YYYYMMDDhh24miss') as CDR_END_TIME,
:key_id as KEY_ID,
:counterid as COUNTER_ID,
:attrid as ATTR_ID,
:subsid as SUBS_ID
from dual) t2
ON (t1.KEY_ID = t2.KEY_ID and t1.COUNTER_ID = t2.COUNTER_ID and t1.ATTR_ID = t2.ATTR_ID and t1.SUBS_ID = t2.SUBS_ID)
WHEN MATCHED THEN
update
set t1.MIN_CYCLE_ID = t2.MIN_CYCLE_ID,
t1.MAX_CYCLE_ID = t2.MAX_CYCLE_ID,
t1.FREE_VOLUME = t2.FREE_VOLUME,
t1.USAGE_VOLUME = t2.USAGE_VOLUME,
t1.OVER_VOLUME = t2.OVER_VOLUME,
t1.TARIFF_PLAN_ID = t2.TARIFF_PLAN_ID,
t1.CDR_END_TIME = t2.CDR_END_TIME
WHEN NOT MATCHED THEN
insert
(t1.MIN_CYCLE_ID,
t1.MAX_CYCLE_ID,
t1.FREE_VOLUME,
t1.USAGE_VOLUME,
t1.OVER_VOLUME,
t1.TARIFF_PLAN_ID,
t1.CDR_END_TIME,
t1.KEY_ID,
t1.COUNTER_ID,
t1.ATTR_ID,
t1.SUBS_ID)
values
(t2.MIN_CYCLE_ID,
t2.MAX_CYCLE_ID,
t2.FREE_VOLUME,
t2.USAGE_VOLUME,
t2.OVER_VOLUME,
t2.TARIFF_PLAN_ID,
t2.CDR_END_TIME,
t2.KEY_ID,
t2.COUNTER_ID,
t2.ATTR_ID,
t2.SUBS_ID)
----- Call Stack Trace -----
kgeriv kgeasi kdksep kdxtms kdisle kdiins0 kdiinsp kauxsin insidx insflush qerusfetch updaul upsexe

...
Object id on Block? Y
seg/obj: 0x1adb65 csc: 0xce8.dbee5287 itc: 3 flg: E typ: 1 - DATA
brn: 0 bdba: 0x5c4e4d05 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0f47.03a.000b6058 0x5e027858.67b7.06 --U- 3 fsc 0x0000.dbee5ae5
0x02 0x1074.040.000858a2 0x010b3d2f.4bab.06 ---- 5 fsc 0x0000.00000000
0x03 0x0ff1.000.000cda32 0x5e04bd1d.6e0b.8d C--- 0 scn 0x0ce8.dbe90591
...
Tip:
可以看到是有一条merge insert 引起,在MOS 中search一圈没有找到相关信息,我们知道是与index相关的,那么trace 中的key 是不是就是index key呢?先查一下trace中的错误对象

SQL> select to_number('1adb65','xxxxxxxxxxx') from dual;

TO_NUMBER('1ADB65','XXXXXXXXXXX')
---------------------------------
1760101

select owner,object_name,object_type from dba_objects where object_id=1760101;
OWNER OBJECT_NAME OBJECT_TYPE
-------------------- ------------------------------ -------------------------
USERINFO TEST TABLE PARTITION
TIP:
确定是我们SQL语句中的表,那下一步查询表上的所有索引,重点是两列的组合索引

SQL> select
2 c.table_owner ind_table_owner1,
3 c.table_name ind_table_name1,
4 c.index_name ind_index_name1,
5 c.column_position ind_column_position1,
6 c.column_name ind_column_name1,
7 decode(c.descend,'DESC','DESC',null) ind_desc1,i.index_type,t.data_type,i.status
8 from
9 dba_ind_columns c
10 join dba_indexes i on c.index_name=i.index_name
11 join dba_tab_columns t on t.owner=i.table_owner and t.table_name=i.table_name and t.column_name=c.column_name
12 where
13 i.table_owner='USERINFO' and i.table_name='TEST' order by 2,3;

IND_TABLE_OWNER IND_TABLE_NAME1 IND_INDEX_NAME1 IND_COLUMN_POSITION1 IND_COLUMN_NAME IND_ INDEX_TY DATA_TYPE STATUS
--------------- --------------- -------------------- -------------------- --------------- ---- -------- ---------- ----------
USERINFO TEST IDX1_TEST 1 MIN_CYCLE_ID NORMAL NUMBER N/A
USERINFO TEST IDX1_TEST 2 MAX_CYCLE_ID NORMAL NUMBER N/A
USERINFO TEST INX_TEST 1 KEY_ID NORMAL NUMBER N/A
USERINFO TEST INX_TEST 2 COUNTER_ID NORMAL NUMBER N/A
Tip:
表上有两个两列的组合索引,不过列上都是number 类型, 因为第二个key 值比较短,那我们比较容易转换成真实的十进制数值,然后确认是哪个索引,转换方法参考我以前的笔记还原data block dumps实际值

key2 (4): 24 15 01 42

SQL> select to_number('24','xxxxxxxxxxxxx') from dual;
TO_NUMBER('24','XXXXXXXXXXXXX')
-------------------------------
36
SQL> c/24/15
1* select to_number('15','xxxxxxxxxxxxx') from dual
SQL> /

TO_NUMBER('15','XXXXXXXXXXXXX')
-------------------------------
21
SQL> c/15/01
1* select to_number('01','xxxxxxxxxxxxx') from dual
SQL> /
TO_NUMBER('01','XXXXXXXXXXXXX')
-------------------------------
1
SQL> c/01/42
1* select to_number('42','xxxxxxxxxxxxx') from dual
SQL> /
TO_NUMBER('42','XXXXXXXXXXXXX')
-------------------------------
66
TIP:
那拼进来就是3621166, 然后这也就是索引第二列的值,因为目前我们还没确认是哪个索引,所以尝试把两个索引的第二列都查一下,看哪个值存在?

select min_cycle_id from userinfo.TEST where max_cycle_id=3621166 ;

SQL> select key_id from userinfo.TEST where counter_id=3621166;

KEY_ID
----------
3.1508E+12
3.1908E+12
3.3520E+12
Tip:
那到这里我们就可以初步判断是INX_TEST这个索引出了问题。我们再找另一个ora-600 确认下,ora-600 6300的trace文件没有可用信息,看ora-600 6200这是一个index block corrupt相关的错误

# ora-600 6200 trace file #

Block Checking: DBA = 526172808, Block Type = KTB-managed data block
**** row 19: key out of order
**** row 40: key out of order
**** row 101: key out of order
**** row 123: row length 65543 past end of block
**** row 123: row skipped so other stats may be wrong
**** row 196: row length 65543 past end of block
**** row 196: row skipped so other stats may be wrong
**** row 197: key out of order
**** row 243: row length 65543 past end of block
**** row 243: row skipped so other stats may be wrong
**** row 250: row length 65543 past end of block
**** row 250: row skipped so other stats may be wrong
**** row 251: row length 65543 past end of block
**** row 251: row skipped so other stats may be wrong
**** row 252: row length 65543 past end of block
**** row 252: row skipped so other stats may be wrong
**** row 253: row length 65543 past end of block
**** row 253: row skipped so other stats may be wrong
**** row 254: row length 65543 past end of block
**** row 254: row skipped so other stats may be wrong
**** row 255: row length 65543 past end of block
**** row 255: row skipped so other stats may be wrong
**** row 256: row length 65543 past end of block
**** row 256: row skipped so other stats may be wrong
**** row 257: key out of order
**** row 518: row length 65543 past end of block
**** row 518: row skipped so other stats may be wrong
**** row 519: row length 65543 past end of block
**** row 519: row skipped so other stats may be wrong
**** row 520: row length 65543 past end of block
**** row 520: row skipped so other stats may be wrong
**** actual rows marked deleted = 1 != kdxlende = 0
**** key (begin=0x23fd, len=71) overlaps with another
begin = 0x2414 len = 23
---- end index block validation
Block header dump: 0x1f5cc288
Object id on Block? Y
seg/obj: 0x1b0086 csc: 0xce8.dbee5f7f itc: 3 flg: E typ: 2 - INDEX
brn: 3 bdba: 0x1d8d9e05 ver: 0x01 opc: 0
inc: 0 exflg: 0

Itl Xid Uba Flag Lck Scn/Fsc
0x01 0x0032.056.01c81b4c 0x01070ee9.78ea.01 CBU- 0 scn 0x0ca6.df9c8ce1
0x02 0x1090.00a.0004c828 0x5e015267.174a.32 C--- 0 scn 0x0ce8.db5c9770
0x03 0x0ff4.062.00107720 0x0100e967.7562.4e C--- 0 scn 0x0ce8.d449ede4

TIP:
这是一个索引块,下面我们来确认错误对象

SQL> select to_number('1b0086','xxxxxxxxxx') from dual;
TO_NUMBER('1B0086','XXXXXXXXXX')
--------------------------------
1769606

SQL> select owner,object_name,object_type from dba_objects where object_id=1769606;
OWNER OBJECT_NAME OBJECT_TYPE
---------- ------------------------------ -------------------
USERINFO INX_TEST INDEX PARTITION
Note:
和上面我们用ora-600 6302 trace 推断的对象是一样的,下面我们根据dba 确认一下数据块的位置

SQL> select to_char(526172808,'xxxxxxxxxxxx') from dual;
TO_CHAR(52617
-------------
1f5cc288

SQL> @dba 1f5cc288
RFILE# BLOCK# BIGFILE_BLOCK# DUMP_CMD
---------- ---------- -------------- ---------------
125 1884808 526172808 -- alter system dump datafile 125 block 1884808

SQL> select name from v$datafile where RFILE#=125;
NAME
--------------------------------------------
/dev/rzwc_lv30_088

> dbv file=/dev/rzwc_lv30_088 blocksize=16384

DBVERIFY: Release 10.2.0.5.0 - Production on Sun Feb 1 15:58:23 2015

Copyright (c) 1982, 2007, Oracle. All rights reserved.

DBVERIFY - Verification starting : FILE = /dev/rzwc_lv30_088
Block Checking: DBA = 526172808, Block Type = KTB-managed data block
**** row 19: key out of order
**** row 40: key out of order
**** row 101: key out of order
**** row 123: row length 65543 past end of block
**** row 123: row skipped so other stats may be wrong
**** row 196: row length 65543 past end of block
**** row 196: row skipped so other stats may be wrong
**** row 197: key out of order
**** row 243: row length 65543 past end of block
**** row 243: row skipped so other stats may be wrong
**** row 250: row length 65543 past end of block
**** row 250: row skipped so other stats may be wrong
**** row 251: row length 65543 past end of block
**** row 251: row skipped so other stats may be wrong
**** row 252: row length 65543 past end of block
**** row 252: row skipped so other stats may be wrong
**** row 253: row length 65543 past end of block
**** row 253: row skipped so other stats may be wrong
**** row 254: row length 65543 past end of block
**** row 254: row skipped so other stats may be wrong
**** row 255: row length 65543 past end of block
**** row 255: row skipped so other stats may be wrong
**** row 256: row length 65543 past end of block
**** row 256: row skipped so other stats may be wrong
**** row 257: key out of order
**** row 518: row length 65543 past end of block
**** row 518: row skipped so other stats may be wrong
**** row 519: row length 65543 past end of block
**** row 519: row skipped so other stats may be wrong
**** row 520: row length 65543 past end of block
**** row 520: row skipped so other stats may be wrong
**** actual rows marked deleted = 1 != kdxlende = 0
**** key (begin=0x23fd, len=71) overlaps with another
begin = 0x2414 len = 23
---- end index block validation
Page 1884808 failed with check code 6401


DBVERIFY - Verification complete

Total Pages Examined : 1966016
Total Pages Processed (Data) : 0
Total Pages Failing (Data) : 0
Total Pages Processed (Index): 1923445
Total Pages Failing (Index): 1
Total Pages Processed (Other): 24369
Total Pages Processed (Seg) : 0
Total Pages Failing (Seg) : 0
Total Pages Empty : 18202
Total Pages Marked Corrupt : 0
Total Pages Influx : 0
Highest block SCN : 4043020581 (3304.4043020581)
kdzwc2_bds:/home/oracle>

Note:
最终我们确认了是数据文件/dev/rzwc_lv30_088 block# 1884808上存的index block corrupt导致的这一系列ora-600 ora-7445, 解决方法可以考虑drop index, re-create index。 后来也得到确认索引重建后错误没有再出现。这类corrupte block可能是因为突然断电,内存勘误,写错误,oracle bug等原因,bug问题可以提SR 确认。

posted @ 2015-12-03 16:48  travel_jack  阅读(761)  评论(0编辑  收藏  举报