这次的ORA-600 16703的故障比较特殊,客户的一套rac运行了5年多没有重启,这次由于异常导致其中一个节点重启,然后触发了tab$被清空,异常节点启动报ORA-600 16703错误.朋友在故障之后,第一时间没有对在运行的节点进行重启(虽然也无法对外提供业务服务),使得恢复工作相对简单一些,恢复效果也是最完美的.这个是我在对于软件安装介质注入恶意脚本,300天之后重启触发tab$被清空的相关恢复case中,最完美的一次(以前遇到过一次客户是虚拟化环境通过cdp回退然后类似方法处理ORA-600 16703直接把orachk备份表插入到tab$恢复),凸显了这位朋友在故障发生之后对于问题的准确判断和果断的应对能力.
有朋友和我反馈,他们数据库突然报大量ORA-600错误,业务无法正常操作,我分析相关日志确认:节点2重启之后节点1开始报大量ORA-600错误,但是节点一直处于open状态
Fri Jul 25 15:28:53 2025Decreasing number of real time LMS from 3 to 0Fri Jul 25 15:29:18 2025Reconfiguration started (old inc 13, new inc 15)List of instances: 1 2 (myinst: 1) Global Resource Directory frozen Communication channels reestablished Master broadcasted resource hash value bitmaps Non-local Process blocks cleaned outFri Jul 25 15:29:18 2025 LMS 1: 0 GCS shadows cancelled, 0 closed, 0 Xw survivedFri Jul 25 15:29:18 2025 LMS 2: 0 GCS shadows cancelled, 0 closed, 0 Xw survivedFri Jul 25 15:29:18 2025 LMS 0: 0 GCS shadows cancelled, 0 closed, 0 Xw survived Set master node info Submitted all remote-enqueue requests Dwn-cvts replayed, VALBLKs dubious All grantable enqueues granted Submitted all GCS remote-cache requests Fix write in gcs resourcesReconfiguration completeFri Jul 25 15:29:20 2025minact-scn: Master returning as live inst:2 has inc# mismatch instinc:0 cur:15 errcnt:0Fri Jul 25 15:30:07 2025Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_252634.trc (incident=77234):ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [],[],[],[],[],[],[],[]Incident details in: /u01/oracle/diag/rdbms/orcl/orcl1/incident/incdir_77234/orcl1_ora_252634_i77234.trcFri Jul 25 15:30:18 2025Use ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Fri Jul 25 15:30:19 2025Sweep [inc][77234]: completedSweep [inc2][77234]: completedFri Jul 25 15:30:27 2025Errors in file /u01/oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_261587.trc (incident=76487):ORA-00600: internal error code, arguments: [ktsircinfo_num1],[0],[0],[0], [], [], [], [], [], [], [], [] |
通过grep筛选报错信息
[root@iZbp11c0qyuuo1gr7j98upZ tmp]# egrep "ORA-00600|ORA-07445" alert_1.txt |sort -uORA-00600: internal error code, arguments: [25027], [0], [0], [], [], [], [], [], [], [], [], []ORA-00600: internal error code, arguments: [kkpo_rcinfo_defstg:delseg], [28941391], [], [], [], []ORA-00600: internal error code, arguments: [ktsircinfo_num1], [0], [0], [0], [], [], [], [], [], []ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [DEL], [0], [35038924], [], [], [], [], [], [], []ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [277736], [], [], [], [], [], [], []ORA-00600: 内部错误代码, 参数: [16659], [kqldtu], [INS], [0], [28829570], [], [], [], [], [], [], []ORA-07445: exception encountered: core dump [qknSetParent()+9] [SIGSEGV] [ADDR:0x10354] [PC:0x1A48B9B] [Address not mapped to object] []ORA-07445: exception encountered: core dump [qksxaMoveQbAnnotations()+168] [SIGSEGV] [ADDR:0x20304] [PC:0x1594954] [Address not mapped to object] []ORA-07445: 出现异常错误: 核心转储 [qknExpRegIni_int()+87] [SIGSEGV] [ADDR:0x8C] [PC:0x1A4D729] [Address not mapped to object] []ORA-07445: 出现异常错误: 核心转储 [qksxaMoveQbAnnotations()+168] [SIGSEGV] [ADDR:0x0] [PC:0x1594954] [SI_KERNEL(general_protection)] [] |
既然是由于节点2重启导致节点1报错,那分析节点2重启相关情况,第一次重启成功之后,数据库开始报ORA-600错误
Fri Jul 25 15:29:29 2025QMNC started with pid=46, OS id=363757 Fri Jul 25 15:29:31 2025minact-scn: Inst 2 is a slave inc#:15 mmon proc-id:363622 status:0x2minact-scn status: grec-scn:0x0000.00000000 gmin-scn:0x0000.00000000 gcalc-scn:0x0000.00000000Fri Jul 25 15:29:33 2025Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc (incident=248519):ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248519/orcl2_m003_363779_i248519.trcStarting background process SMCOFri Jul 25 15:29:35 2025SMCO started with pid=57, OS id=363802 Fri Jul 25 15:29:35 2025Completed: ALTER DATABASE OPEN /* db agent *//* {2:23784:2} */Use ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc (incident=248520):ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248520/orcl2_m003_363779_i248520.trcUse ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Exception [type: SIGSEGV, Address not mapped to object][ADDR:0x10] [PC:0x2FDA4BB,kgmdelsis()+219][flags:0x0,count:1]Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_m003_363779.trc (incident=248521):ORA-07445: exception encountered: core dump [kgmdelsis()+219] [SIGSEGV] [ADDR:0x10] [PC:0x2FDA4BB] [Address not mapped to object] []ORA-00600: internal error code, arguments: [kgmfvmi#3], [], [], [], [], [], [], [], [], [], [], []Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_248521/orcl2_m003_363779_i248521.trcUse ADRCI or Support Workbench to package the incident.Fri Jul 25 15:29:47 2025Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_363808.trc (incident=248559):ORA-00600: internal error code, arguments: [kkposds2], [18446744073709551615], [18446744073709551615], [18446744073709551615], [], [], [], [], [], [], [], [] |
然后第二次重启数据库无法open成功,而是报ORA-600 16703错误
ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */Picked broadcast on commit scheme to generate SCNsARCH: STARTING ARCH PROCESSESFri Jul 25 15:41:23 2025ARC0 started with pid=39, OS id=369231 ARC0: Archival startedARCH: STARTING ARCH PROCESSES COMPLETEARC0: STARTING ARCH PROCESSESFri Jul 25 15:41:24 2025ARC1 started with pid=40, OS id=369242 Fri Jul 25 15:41:24 2025ARC2 started with pid=41, OS id=369244 Fri Jul 25 15:41:24 2025ARC3 started with pid=42, OS id=369246 ARC1: Archival startedARC2: Archival startedARC1: Becoming the 'no FAL' ARCHARC1: Becoming the 'no SRL' ARCHARC2: Becoming the heartbeat ARCHThread 2 opened at log sequence 33585 Current log# 7 seq# 33585 mem# 0: +DATA/orcl/onlinelog/group_7.269.1011373611Successful open of redo thread 2MTTR advisory is disabled because FAST_START_MTTR_TARGET is not setSMON: enabling cache recoveryErrors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc (incident=260494):ORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []Incident details in: /u01/oracle/diag/rdbms/orcl/orcl2/incident/incdir_260494/orcl2_ora_369210_i260494.trcARC3: Archival startedARC0: STARTING ARCH PROCESSES COMPLETESUCCESS: diskgroup FRA was mountedFri Jul 25 15:41:30 2025Use ADRCI or Support Workbench to package the incident.See Note 411.1 at My Oracle Support for error and packaging details.Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:ORA-00704: bootstrap process failureORA-00704: bootstrap process failureORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []Errors in file /u01/oracle/diag/rdbms/orcl/orcl2/trace/orcl2_ora_369210.trc:ORA-00704: bootstrap process failureORA-00704: bootstrap process failureORA-00600: internal error code, arguments: [16703], [1403], [20], [], [], [], [], [], [], [], [], []Error 704 happened during db open, shutting down databaseUSER (ospid: 369210): terminating the instance due to error 704Instance terminated by USER, pid = 369210ORA-1092 signalled during: ALTER DATABASE OPEN /* db agent *//* {2:21799:2} */...opiodr aborting process unknown ospid (369210) as a result of ORA-1092Fri Jul 25 15:41:31 2025ORA-1092 : opitsk aborting process |
到这一步基本上就清晰了,大概率是遭遇到以前恢复的类似case,tab$数据被清空导致,类似案例
ORA-600 16703故障解析—tab$表被清空
警告:互联网中有oracle介质被注入恶意程序导致—ORA-600 16703
通过在故障主机上找到安装介质,验证md5确认该程序是被注入恶意代码程序
这个库由于还有一个节点处于open状态,相对处理比较简单,直接把备份的表数据反向插入回去即可
SYS@orcl1> select count(1) from ORACHK3C08C86E063530510ACD937; COUNT(1)---------- 20696SYS@orcl1> insert into tab$ select * from ORACHK3C08C86E063530510ACD937;20696 rows created.SYS@orcl1> commit;Commit complete.SYS@orcl1> select object_name,to_char(CREATED,'yyyy-mm-dd hh24:mi:ss') from dba_objects 2 where object_name in('DBMS_SUPPORT_DBMONITOR','DBMS_SUPPORT_DBMONITORP');OBJECT_NAME TO_CHAR(CREATED,'YY-------------------------------------------- -------------------DBMS_SUPPORT_DBMONITORP 2019-06-19 17:06:46DBMS_SUPPORT_DBMONITOR 2019-06-19 17:06:46 |
这次故障能够快速顺利的恢复,和客户发现故障之后保留第一现场,没有把一个open的节点也重启有很大关系,open的节点也重启了,那后续恢复工作会麻烦很多,效果可能也没有这样的完美.

浙公网安备 33010602011771号