联系:手机/微信(+86 17813235971) QQ(107644445)
标题:ORA-15335 ORA-15130 ORA-15066 ORA-15196
作者:惜分飞©版权所有[未经本人同意,不得以任何形式转载,否则有进一步追究法律责任的权利.]
客户反馈,数据库无法正常启动,通过分析asm的alert日志发现,data磁盘组mount成功之后,没有一会儿自动dismount掉
Mon Sep 26 16:40:14 2022SQL> /* ASMCMD */ALTER DISKGROUP data MOUNT NOTE: cache registered group DATA number=2 incarn=0x9dfa705fNOTE: cache began mount (first) of group DATA number=2 incarn=0x9dfa705fNOTE: Assigning number (2,1) to disk (/dev/oracleasm/disks/DATA02)NOTE: Assigning number (2,0) to disk (/dev/oracleasm/disks/DATA01)Mon Sep 26 16:40:20 2022NOTE: GMON heartbeating for grp 2GMON querying group 2 at 68 for pid 25, osid 14650NOTE: cache opening disk 0 of grp 2: DATA_0000 path:/dev/oracleasm/disks/DATA01NOTE: F1X0 found on disk 0 au 2 fcn 0.0NOTE: cache opening disk 1 of grp 2: DATA_0001 path:/dev/oracleasm/disks/DATA02NOTE: cache mounting (first) external redundancy group 2/0x9DFA705F (DATA)Mon Sep 26 16:40:20 2022* allocate domain 2, invalid = TRUE kjbdomatt send to inst 2Mon Sep 26 16:40:20 2022NOTE: attached to recovery domain 2NOTE: cache recovered group 2 to fcn 0.321845NOTE: redo buffer size is 256 blocks (1053184 bytes)Mon Sep 26 16:40:20 2022NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (DATA)NOTE: LGWR found thread 1 closed at ABA 20.3546NOTE: LGWR mounted thread 1 for diskgroup 2 (DATA)NOTE: LGWR opening thread 1 at fcn 0.321845 ABA 21.3547NOTE: cache mounting group 2/0x9DFA705F (DATA) succeededNOTE: cache ending mount (success) of group DATA number=2 incarn=0x9dfa705fMon Sep 26 16:40:20 2022NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 2SUCCESS: diskgroup DATA was mountedSUCCESS: /* ASMCMD */ALTER DISKGROUP data MOUNT Mon Sep 26 16:40:22 2022WARNING: failed to online diskgroup resource ora.DATA.dg (unable to communicate with CRSD/OHASD)Mon Sep 26 16:40:47 2022NOTE: client xff1:xff registered, osid 14742, mbr 0x0Mon Sep 26 16:40:57 2022WARNING: cache read a corrupt block: group=2(DATA) dsk=1 blk=257 disk=1 (DATA_0001) incarn=3916071178 au=113792 blk=1 count=1Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]NOTE: a corrupted block from group DATA was dumped to /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trcWARNING: cache read (retry) a corrupt block: group=2(DATA) dsk=1 blk=257 disk=1 (DATA_0001) incarn=3916071178 au=113792 blk=1 count=1Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc:ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]ERROR: cache failed to read group=2(DATA) dsk=1 blk=257 from disk(s): 1(DATA_0001)ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]NOTE: cache initiating offline of disk 1 group DATANOTE: process _user14778_+asm1 (14778) initiating offline of disk 1.3916071178 (DATA_0001) with mask 0x7e in group 2NOTE: initiating PST update: grp = 2, dsk = 1/0xe96a810a, mask = 0x6a, op = clearMon Sep 26 16:40:58 2022GMON updating disk modes for group 2 at 70 for pid 28, osid 14778ERROR: Disk 1 cannot be offlined, since diskgroup has external redundancy.ERROR: too many offline disks in PST (grp 2)Mon Sep 26 16:40:58 2022NOTE: cache dismounting (not clean) group 2/0x9DFA705F (DATA) WARNING: Offline for disk DATA_0001 in mode 0x7f failed.NOTE: messaging CKPT to quiesce pins Unix process pid: 14782, image: oracle@oracle11grac1 (B000)Mon Sep 26 16:40:58 2022NOTE: halting all I/Os to diskgroup 2 (DATA)Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_ora_14778.trc (incident=144548):ORA-15335: ASM metadata corruption detected in disk group 'DATA'ORA-15130: diskgroup "DATA" is being dismountedORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data lossORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]Incident details in: /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trcMon Sep 26 16:40:58 2022Sweep [inc][144548]: completedSystem State dumped to trace file /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548/+ASM1_ora_14778_i144548.trcMon Sep 26 16:40:58 2022NOTE: AMDU dump of disk group DATA created at /opt/grid/diag/asm/+asm/+ASM1/incident/incdir_144548Mon Sep 26 16:41:00 2022NOTE: LGWR doing non-clean dismount of group 2 (DATA)NOTE: LGWR sync ABA=21.3550 last written ABA 21.3550Mon Sep 26 16:41:00 2022Sweep [inc2][144548]: completedMon Sep 26 16:41:00 2022ERROR: ORA-15130 in COD recovery for diskgroup 2/0x9dfa705f (DATA)ERROR: ORA-15130 thrown in RBAL for group number 2Errors in file /opt/grid/diag/asm/+asm/+ASM1/trace/+ASM1_rbal_5162.trc:ORA-15130: diskgroup "DATA" is being dismounted |
这里看主要是由于asm 磁盘组需要做COD recovery导致无法正常稳定的mount,主要原因是遭遇到asm disk的逻辑坏块(存储物理上看是ok的,但是实际数据在asm中看是异常的)
数据库alert日志报错
Mon Sep 26 16:40:52 2022Successful mount of redo thread 1, with mount id 1097279951Database mounted in Shared Mode (CLUSTER_DATABASE=TRUE)Lost write protection disabledCompleted: alter database mountalter database openThis instance was first to openPicked broadcast on commit scheme to generate SCNsLGWR: STARTING ARCH PROCESSESMon Sep 26 16:40:56 2022ARC0 started with pid=40, OS id=14761 ARC0: Archival startedLGWR: STARTING ARCH PROCESSES COMPLETEARC0: STARTING ARCH PROCESSESMon Sep 26 16:40:57 2022ARC1 started with pid=41, OS id=14764 Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:ORA-00313: ??????? 1 (???? 1) ???Mon Sep 26 16:40:57 2022ARC2 started with pid=42, OS id=14766 Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_lgwr_14479.trc:ORA-00313: ??????? 2 (???? 1) ???Mon Sep 26 16:40:57 2022Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:ORA-00313: open failed for members of log group 1 of thread 1Mon Sep 26 16:40:57 2022ARC3 started with pid=44, OS id=14770 ARC1: Archival startedARC2: Archival startedARC1: Becoming the 'no FAL' ARCHARC1: Becoming the 'no SRL' ARCHARC2: Becoming the heartbeat ARCHErrors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:ORA-00313: open failed for members of log group 1 of thread 1Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc2_14766.trc:ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc1_14764.trc:ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc (incident=180281):ORA-15335: ASM metadata corruption detected in disk group 'DATA'ORA-15130: diskgroup "DATA" is being dismountedORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data lossORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]ARC3: Archival startedARC0: STARTING ARCH PROCESSES COMPLETEErrors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215ORA-15130: diskgroup "DATA" is being dismountedErrors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215ORA-15130: diskgroup "DATA" is being dismountedErrors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc0_14761.trc:ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215ORA-15130: diskgroup "DATA" is being dismountedErrors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_arc3_14770.trc:ORA-00313: 无法打开日志组 1 (用于线程 1) 的成员ORA-00312: 联机日志 1 线程 1: '+DATA/xff/onlinelog/group_1.271.1025610215'ORA-17503: ksfdopn: 2 未能打开文件 +DATA/xff/onlinelog/group_1.271.1025610215ORA-15130: diskgroup "DATA" is being dismountedUnable to create archive log file '+DATA'Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:ORA-19816: WARNING: Files may exist in db_recovery_file_dest that are not known to database.ORA-17502: ksfdcre:4 Failed to create file +DATAORA-15335: ASM metadata corruption detected in disk group 'DATA'ORA-15130: diskgroup "DATA" is being dismountedORA-15066: offlining disk "DATA_0001" in group "DATA" may result in a data lossORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]ORA-15196: invalid ASM block header [kfc.c:26368] [endian_kfbh] [2147483649] [257] [0 != 1]*************************************************************WARNING: A file of type ARCHIVED LOG may exist indb_recovery_file_dest that is not known to the database.Use the RMAN command CATALOG RECOVERY AREA to re-catalogany such files. If files cannot be cataloged, then manuallydelete them using OS command. This is most likely theresult of a crash during file creation.*************************************************************ARCH: Error 19504 Creating archive log file to '+DATA'NOTE: Deferred communication with ASM instanceErrors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:ORA-15130: diskgroup "DATA" is being dismountedNOTE: deferred map free for map id 23Errors in file /opt/oracle/diag/rdbms/xff/xff1/trace/xff1_ora_14732.trc:ORA-16038: log 1 sequence# 14235 cannot be archivedORA-19504: failed to create file ""ORA-00312: online log 1 thread 1: '+DATA/xff/onlinelog/group_1.271.1025610215'ORA-00312: online log 1 thread 1: '+ARCH/xff/onlinelog/group_1.279.1025610217'Mon Sep 26 16:40:58 2022Sweep [inc][180281]: completedSweep [inc2][180281]: completedUSER (ospid: 14732): terminating the instance due to error 16038Mon Sep 26 16:40:59 2022System state dump requested by (instance=1, osid=14732), summary=[abnormal instance termination].Instance terminated by USER, pid = 14732 |
对于这类故障处理相对比较容易,通过patch asm,让data磁盘组稳定mount,然后open库,迁移数据,实现数据0丢失,完美恢复
浙公网安备 33010602011771号