ORA-63999 data file suffered media failure 导致实例Crash
KCF: read, write or open error, block=0xb79ab online=1 file=85 '/dev/vgpmesdb12/rLV_FEM_PRD_I02' error=27063 txt: 'HPUX-ia64 Error: 11: Resource temporarily unavailable Additional information: -1 Additional information: 8192' Encountered write error *** 2015-06-17 20:06:22.714 DDE rules only execution for : ORA 1110 ----- START Event Driven Actions Dump ---- ---- END Event Driven Actions Dump ---- ----- START DDE Actions Dump ----- Executing SYNC actions ----- START DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (Async) ----- Successfully dispatched ----- END DDE Action: 'DB_STRUCTURE_INTEGRITY_CHECK' (SUCCESS, 0 csec) ----- Executing ASYNC actions ----- END DDE Actions Dump (total 0 csec) ----- error 63999 detected in background process ORA-63999: data file suffered media failure ORA-01114: IO error writing block to file 85 (block # 752043) ORA-01110: data file 85: '/dev/vgpmesdb12/rLV_FEM_PRD_I02' ORA-27063: number of bytes read/written is incorrect HPUX-ia64 Error: 11: Resource temporarily unavailable Additional information: -1 Additional information: 8192 kjzduptcctx: Notifying DIAG for crash event ----- Abridged Call Stack Trace ----- ksedsts()+544<-kjzdssdmp()+400<-kjzduptcctx()+432<-kjzdicrshnfy()+128<-$cold_ksuitm()+5872<-$cold_ksbrdp()+2704<-opirip()+1296<-opidrv()+1152<-sou2o()+256<-opimai_real()+352<-ssthrdmain()+576<-main()+336<-main_opd_entry()+80 ----- End of Abridged Call Stack Trace ----- *** 2015-06-17 20:06:23.172 DBW1 (ospid: 5833): terminating the instance due to error 63999 ksuitm: waiting up to [5] seconds before killing DIAG(5807) |
数据库文件所在设备的部分io错误,导致实例宕机。由于报错ORA-27063: number of bytes read/written is incorrect,跟踪下来是初步怀疑坏块导致,通过效验后未发现坏块,HPUX-ia64 Error: 11: Resource temporarily unavailable的错误引入考虑,初步怀疑是hp的系统内部在做一些操作时候导致/dev/vgpmesdb12/rLV_FEM_PRD_I02设备无法被访问到。
随后在官方找到相似bug 16884689 : DATABASE CRASH DUE TO ORA-27063 HPUX-IA64 ERROR: 11.从整体的诊断看,问题的原因还是因为出现了io问题导致的,而且集群内部是在发生io问题后才发现数据库资源的问题,所以需判断是否hp系统或者io系统各模块的问题.与发现的bug不同场景.
文件io错误时候实例重启受_datafile_write_errors_crash_instance控制影响。