代码改变世界

使用XAG配置GoldenGate在RAC集群环境中的高可用

2021-09-14 18:59  AlfredZhao  阅读(549)  评论(0编辑  收藏  举报

背景:本文是根据实际客户测试需求整理,因为客户OGG所在环境只有GI集群,数据库部署在其他位置,所以会有一些差异,但核心思路一致,已完全测试通过,整理出来供大家参考。

1.前期准备

RAC环境

  DB: 19.12.0
  GI: 19.12.0
  OS: RHEL 7.6以上 or Oracle Linux 7.7以上

OGG软件

  Oracle GoldenGate 19.1.0.0.4 for Oracle on Linux x86-64

XAG软件

  Patch 31215432: XAG 10.2 BUG FIX MLR

目前最新的RU是19.12,同时下载对应最新的OPatch版本,之后使用最新OPatch应用19.12的补丁。
成功应用19.12的RU之后,检查下ACFS的相关Modules是否正常。

2.创建ACFS文件系统

因为本次安装环境只有GI的grid用户,所以acfs这里设置为grid和oinstall。 ASMCA调用图形界面创建ACFS文件系统,只要正常显示一般就没啥问题。

3.安装GoldenGate软件

此次安装选择Oracle GoldenGate for Oracle Database 19c 因为都是使用GRID用户安装,所以这里需要手工改为GRID_HOME对应路径,也充当了客户端功能,无需额外安装。
--ogg install
[grid@db193 media]$ unzip V983658-01.zip
[grid@db193 ~]$ cd /u01/media/fbo_ggs_Linux_x64_shiphome/Disk1/
[grid@db193 Disk1]$ ls
install  response  runInstaller  stage
[grid@db193 Disk1]$ ./runInstaller

安装成功:特别注意这里手工修改了图形界面中的ORACLE_HOME默认值!!
当然修改这里也是因为我这个客户的需求相对特殊,没有oracle用户及其软件目录。

4.安装XAG软件

解压XAG介质,创建XAG目录,安装XAG软件,设置环境变量:

[root@db193 media]# ls -lrth
总用量 531M
-rwxr-xr-x 1 root root 213K 9月  14 09:23 p31215432_190000_Generic.zip
-rw-r--r-- 1 root root 531M 9月  14 09:24 V983658-01.zip

为了操作方便,root和grid用户都配置下GRID_HOME变量:

export GRID_HOME=/u01/app/19.3.0/grid

xag安装,确定安装目录:

[root@db195 ~]# cd /u01/app
[root@db195 app]# mkdir xag
[root@db195 app]# chown grid:oinstall xag

注意:xag目录确保所有节点都有创建成功,权限一致且正确;

xagsetup.sh --install --directory <installdir> [{--nodes <node1,node2[,...]> | --all_nodes}]
xagsetup.sh --install --directory /u01/app/xag --all_nodes

[grid@db193 media]$ unzip p31215432_190000_Generic.zip
[grid@db193 xag]$ pwd
/u01/media/xag
[grid@db193 xag]$ ./xagsetup.sh --install --directory /u01/app/xag --all_nodes
Installing Oracle Grid Infrastructure Agents on: db193
Installing Oracle Grid Infrastructure Agents on: db195
Updating XAG resources.
Successfully updated XAG resources.

设置环境变量:

export XAG_HOME=/u01/app/xag

同时将$XAG_HOME/bin设置到PATH变量中,方便调用。

5.在cluster上添加OGG资源

源端和目标端集群添加OGG资源方法一致,本次实施的环境,要配置的数据库不在本集群,只有GI集群软件和grid用户:

5.1 选择一个未使用的VIP地址添加

[grid@db193 admin]$ $GRID_HOME/bin/crsctl stat res -p |grep -ie .network -ie subnet |grep -ie name -ie subnet
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="asmnetwork" type="ResList">ora.asmnet1.asmnetwork</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="asmnetwork" type="ResList">ora.asmnet1.asmnetwork</Arg></xml>
SUBNET=10.10.1.0
REGISTRATION_INVITED_SUBNETS=
NAME=ora.asmnet1.asmnetwork(ora.asmgroup)
USR_ORA_SUBNET=10.10.1.0
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
NAME=ora.net1.network
USR_ORA_SUBNET=192.168.1.0
START_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>
STOP_DEPENDENCIES_RTE_INTERNAL=<xml><Arg name="network" type="Res">ora.net1.network</Arg></xml>

[root@db193 media]# $GRID_HOME/bin/appvipcfg create -network=1 -ip=192.168.1.198 -vipname=xag.gg_1-vip.vip -user=grid

5.2 将VIP资源赋权给GRID用户

[root@db193 media]# $GRID_HOME/bin/crsctl setperm resource xag.gg_1-vip.vip -u user:grid:r-x

5.3 启动VIP并检查状态

启动VIP资源:

[grid@db193 ~]$ $GRID_HOME/bin/crsctl start resource xag.gg_1-vip.vip
CRS-2672: Attempting to start 'xag.gg_1-vip.vip' on 'db193'
CRS-2676: Start of 'xag.gg_1-vip.vip' on 'db193' succeeded

检查VIP资源状态:

[grid@db193 ~]$ $GRID_HOME/bin/crsctl status resource xag.gg_1-vip.vip
NAME=xag.gg_1-vip.vip
TYPE=app.appviptypex2.type
TARGET=ONLINE
STATE=ONLINE on db193

5.4 添加goldengate实例并检查状态

[grid@db193 grid]$ $XAG_HOME/bin/agctl add goldengate gg_1 --gg_home /oggsou --instance_type source --nodes db193,db195 --vip_name xag.gg_1-vip.vip --filesystems ora.data.oggsou.acfs --oracle_home /u01/app/19.3.0/grid

检查状态
[grid@db193 grid]$ $XAG_HOME/bin/agctl status goldengate gg_1
Goldengate  instance 'gg_1' is not running

启动goldengate gg_1
[grid@db193 grid]$ $XAG_HOME/bin/agctl start goldengate gg_1

5.5 检查资源状态

[grid@db195 oggsou]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.OGGSOU.advm
               ONLINE  ONLINE       db193                    STABLE
               ONLINE  ONLINE       db195                    STABLE
ora.LISTENER.lsnr
               ONLINE  ONLINE       db193                    STABLE
               ONLINE  ONLINE       db195                    STABLE
ora.chad
               ONLINE  ONLINE       db193                    STABLE
               ONLINE  ONLINE       db195                    STABLE
ora.data.oggsou.acfs
               ONLINE  ONLINE       db193                    mounted on /oggsou,S
                                                             TABLE
               ONLINE  ONLINE       db195                    mounted on /oggsou,S
                                                             TABLE
ora.net1.network
               ONLINE  ONLINE       db193                    STABLE
               ONLINE  ONLINE       db195                    STABLE
ora.ons
               ONLINE  ONLINE       db193                    STABLE
               ONLINE  ONLINE       db195                    STABLE
ora.proxy_advm
               ONLINE  ONLINE       db193                    STABLE
               ONLINE  ONLINE       db195                    STABLE
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.ASMNET1LSNR_ASM.lsnr(ora.asmgroup)
      1        ONLINE  ONLINE       db193                    STABLE
      2        ONLINE  ONLINE       db195                    STABLE
ora.CRS.dg(ora.asmgroup)
      1        ONLINE  ONLINE       db193                    STABLE
      2        ONLINE  ONLINE       db195                    STABLE
ora.DATA.dg(ora.asmgroup)
      1        ONLINE  ONLINE       db193                    STABLE
      2        ONLINE  ONLINE       db195                    STABLE
ora.LISTENER_SCAN1.lsnr
      1        ONLINE  ONLINE       db195                    STABLE
ora.asm(ora.asmgroup)
      1        ONLINE  ONLINE       db193                    Started,STABLE
      2        ONLINE  ONLINE       db195                    Started,STABLE
ora.asmnet1.asmnetwork(ora.asmgroup)
      1        ONLINE  ONLINE       db193                    STABLE
      2        ONLINE  ONLINE       db195                    STABLE
ora.cvu
      1        ONLINE  ONLINE       db195                    STABLE
ora.db193.vip
      1        ONLINE  ONLINE       db193                    STABLE
ora.db195.vip
      1        ONLINE  ONLINE       db195                    STABLE
ora.jydb.cmdb1.svc
      2        ONLINE  ONLINE       db195                    STABLE
ora.jydb.db
      1        ONLINE  ONLINE       db193                    Open,HOME=/u01/app/o
                                                             racle/product/19.3.0
                                                             /db_1,STABLE
      2        ONLINE  ONLINE       db195                    Open,HOME=/u01/app/o
                                                             racle/product/19.3.0
                                                             /db_1,STABLE
ora.qosmserver
      1        ONLINE  ONLINE       db195                    STABLE
ora.scan1.vip
      1        ONLINE  ONLINE       db195                    STABLE
xag.gg_1-vip.vip
      1        ONLINE  ONLINE       db195                    STABLE
xag.gg_1.goldengate
      1        ONLINE  ONLINE       db195                    STABLE
--------------------------------------------------------------------------------

5.6 切换测试

节点db193切换到节点db195:

[grid@db193 grid]$ $XAG_HOME/bin/agctl relocate goldengate gg_1 --node db195

[grid@db193 grid]$ crsctl stat res -t
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1-vip.vip
      1        ONLINE  ONLINE       db195                    STABLE
xag.gg_1.goldengate
      1        ONLINE  ONLINE       db195                    STABLE
--------------------------------------------------------------------------------

节点db195切换到节点db193:

[grid@db193 grid]$ $XAG_HOME/bin/agctl relocate goldengate gg_1 --node db193

[grid@db193 grid]$ crsctl stat res -t
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1-vip.vip
      1        ONLINE  ONLINE       db193                    STABLE
xag.gg_1.goldengate
      1        ONLINE  ONLINE       db193                    STABLE
--------------------------------------------------------------------------------

均可以正常切换。

同样测试reboot重启db195主机,OGG的VIP和资源也会自动切换到db193,反之亦然。说明goldengate的高可用OK。

6.RAC上OGG的启停方法

6.1 停止OGG常用命令

1. 停止GoldenGate资源

[grid@db195 oggsou]$ agctl stop goldengate gg_1

[grid@db195 oggsou]$ crsctl stat res xag.gg_1.goldengate -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1.goldengate
      1        OFFLINE OFFLINE                               STABLE
--------------------------------------------------------------------------------


2. 停止ACFS文件系统

[grid@db195 ~]$ srvctl stop filesystem -volume oggsou -diskgroup data

[grid@db195 ~]$ crsctl stat res -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.data.oggsou.acfs
               OFFLINE OFFLINE      db193                    admin unmounted /ogg
                                                             sou,STABLE
               OFFLINE OFFLINE      db195                    admin unmounted /ogg
                                                             sou,STABLE
--------------------------------------------------------------------------------


3. 停止CRS

[root@db195 ~]# crsctl stop has
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'db195'
CRS-2673: Attempting to stop 'ora.crsd' on 'db195'
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server 'db195'
 <省略>
CRS-2677: Stop of 'ora.gipcd' on 'db195' succeeded
CRS-2677: Stop of 'ora.gpnpd' on 'db195' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'db195' has completed
CRS-4133: Oracle High Availability Services has been stopped.



4. 检查CRS是否完全停止

[root@db195 ~]#  crsctl stat res -t -init
CRS-4639: Could not contact Oracle High Availability Services
CRS-4000: Command Status failed, or completed with errors.

6.2 启动OGG常用命令

1.启动CRS

[root@db195 ~]# crsctl start has
CRS-4123: Oracle High Availability Services has been started.


2. 启动ACFS文件系统

[grid@db195 ~]$ srvctl start filesystem -volume oggsou -diskgroup data

[grid@db195 ~]$ crsctl stat res ora.data.oggsou.acfs -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.data.oggsou.acfs
               ONLINE  ONLINE       db193                    mounted on /oggsou,S
                                                             TABLE
               ONLINE  ONLINE       db195                    mounted on /oggsou,S
                                                             TABLE
--------------------------------------------------------------------------------



3. 启动GoldenGate资源

[grid@db195 ~]$ agctl start goldengate gg_1

[grid@db195 ~]$ crsctl stat res xag.gg_1.goldengate -t
--------------------------------------------------------------------------------
Name           Target  State        Server                   State details
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
xag.gg_1.goldengate
      1        ONLINE  ONLINE       db193                    STABLE
--------------------------------------------------------------------------------

7.其他补充

OGG具体进程也可以加入到集群监管,这样有进程异常通过集群状态可以清楚看到:

[grid@db193 oggsou]$ agctl modify goldengate gg_1 --monitor_extracts extjy1
[grid@db193 oggsou]$ agctl config goldengate gg_1
Instance name: gg_1
Application GoldenGate location is: /oggsou
Goldengate MicroServices Architecture environment: no
GoldenGate instance type is: source
EXTRACT groups to monitor: extjy1
REPLICAT groups to monitor:
Critical EXTRACT groups:
Critical REPLICAT groups:
Autostart on DataGuard role transition to PRIMARY: no
Autostart JAgent: no
Configured to run on Nodes: db193 db195
ORACLE_HOME location is: /u01/app/19.3.0/grid
File System resources needed: ora.data.oggsou.acfs
VIP name: xag.gg_1-vip.vip

如果有监管的进程未启动时会显示:

xag.gg_1-vip.vip
      1        ONLINE  ONLINE       db195                    STABLE
xag.gg_1.goldengate
      1        ONLINE  INTERMEDIATE db195                    ER(s) not running :
                                                             EXTJY1,STABLE
--------------------------------------------------------------------------------

OGG的mgr进程可以配置自动启动其他进程(AUTOSTART ER *),下面是测试中使用的OGG配置供参考:

GGSCI (db193) 1> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING
EXTRACT     RUNNING     EXTJY1      00:00:02      00:00:00


GGSCI (db193) 2> view param mgr

AUTORESTART ER *, RETRIES 5, WAITMINUTES 1, RESETMINUTES 60
AUTOSTART ER *
PORT 7809


GGSCI (db193) 3> view param extjy1

EXTRACT extjy1
USERID ggs_admin@prod, PASSWORD ggs_admin
TRANLOGOPTIONS DBLOGREADER
EXTTRAIL ./dirdat/sa
TABLE JY.T_SECOND_P;

最终本环境经测试可以实现各种场景切换:人工relocate切换、crs集群故障自动切换、主机直接重启自动切换等。
笔者感觉使用XAG在RAC环境上配置OGG还是非常不错的,是非常值得推广使用的,大家如果感兴趣可以实际测试感受一下。