代码改变世界

OGG学习笔记03-单向复制简单故障处理

2017-01-19 17:10 AlfredZhao 阅读(...) 评论(...) 编辑 收藏

OGG学习笔记03-单向复制简单故障处理

环境:参考:OGG学习笔记02-单向复制配置实例
实验目的:了解OGG简单故障的基本处理思路。

1. 故障现象
故障现象:启动OGG源端的extract进程,data pump进程,一段时间后发现进程均被终止。

GGSCI (oradb30) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     ABENDED     LPJY1       00:00:00      47:39:54    
EXTRACT     ABENDED     LXJY1       00:00:00      47:40:00    


GGSCI (oradb30) 2> start extract lxjy1

Sending START request to MANAGER ...
EXTRACT LXJY1 starting


GGSCI (oradb30) 3> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     ABENDED     LPJY1       00:00:00      47:40:50    
EXTRACT     RUNNING     LXJY1       00:00:00      47:40:55    


GGSCI (oradb30) 4> start extract lpjy1

Sending START request to MANAGER ...
EXTRACT LPJY1 starting


GGSCI (oradb30) 5> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     LPJY1       00:00:00      47:40:58    
EXTRACT     RUNNING     LXJY1       00:00:00      47:41:04    


GGSCI (oradb30) 6> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     ABENDED     LPJY1       00:00:00      47:41:15    
EXTRACT     RUNNING     LXJY1       00:00:00      47:41:21    


GGSCI (oradb30) 7> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     ABENDED     LPJY1       00:00:00      47:41:19    
EXTRACT     RUNNING     LXJY1       00:00:00      47:41:25    


GGSCI (oradb30) 8> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     ABENDED     LPJY1       00:00:00      47:41:41    
EXTRACT     ABENDED     LXJY1       00:00:00      47:41:47    

2. 查看日志
查看ogg日志ggserr.log, 排查进程被终止的原因。

[ogg@oradb30 ogg]$ cd $GG_HOME
[ogg@oradb30 ogg]$ tail -200f ggserr.log
发现datapump进程lpjy1是因为连接不到目标OGG而终止;extract进程lxjy1是因为无法找到归档日志sequence 160 thread 1而终止。

2017-01-19 14:51:46  INFO    OGG-00993  Oracle GoldenGate Capture for Oracle, lpjy1.prm:  EXTRACT LPJY1 started.
2017-01-19 14:51:49  ERROR   OGG-01224  Oracle GoldenGate Capture for Oracle, lpjy1.prm:  TCP/IP error 113 (No route to host).
2017-01-19 14:51:49  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, lpjy1.prm:  PROCESS ABENDING.
2017-01-19 14:52:28  ERROR   OGG-00446  Oracle GoldenGate Capture for Oracle, lxjy1.prm:  Could not find archived log for sequence 160 thread 1 under default destinations SQL <SELECT  name    FROM v$archived_log   WHERE sequence# = :ora_seq_no AND         thread# = :ora_thread AND         resetlogs_id = :ora_resetlog_id AND         archived = 'YES' AND         deleted = 'NO' AND         name not like '+%'         AND standby_dest = 'NO' >, error retrieving redo file name for sequence 160, archived = 1, use_alternate = 0Not able to establish initial position for sequence 160, rba 7758352.
2017-01-19 14:52:28  ERROR   OGG-01668  Oracle GoldenGate Capture for Oracle, lxjy1.prm:  PROCESS ABENDING.

排查原因发现是归档日志被RMAN备份策略备份完成后删除了,既然有备份,那么下一步只需要从备份集中恢复日志中提示的sequence 160及其之后的日志即可。
这里,也说明配置OGG最好建议是归档模式,否则在这种目标端没有及时获取到源端在线日志的情况下,就没有办法继续应用了。

3. 解决问题
对于lxjy1进程(Extract),只需要从RMAN备份集中恢复sequence 160及其之后的归档日志:

$ rman target /
RMAN> restore archivelog from logseq 160;

然后再启动lxjy1进程。

对于lpjy1进程(Data Pump),只需要确认已经启动目标端OGG所在主机,网通,然后启动目标端数据库和目标OGG,并启动目标OGG的mgr进程,replicat进程即可。

最终确认源端和目标端ogg所有进程均正常running:
源端OGG:

GGSCI (oradb30) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
EXTRACT     RUNNING     LPJY1       00:00:00      00:00:03    
EXTRACT     RUNNING     LXJY1       00:00:00      00:00:00    

目标端OGG:

GGSCI (oradb31) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    RUNNING     RJY1        00:00:00      00:00:01    

OGG学习笔记基础篇: