达梦HA 解释

####sample 1   本地HA现象和解释

 

 

###sample2   本地重启DB如何检查

"重启库,提前注意2点。


监视器日志比较重要cd /dddk/db_name/app/log
" 参看20250113 miqs_drs 确认监视器异常导致数据库suspect问题分析
"可能原因1:检查方法,检查dmmonitor日志,dmmonitor_20241027225545.log ,MON_CONFIRM是否是false,经过确认是该原因,

4.重启之后检查日志:dmmonitor_20241128133243.log ,如果是TRUE,就是正常,确认监视器就开始工作了" 检查项1:日志中 MON_CONFIRM是否是true

"可能原因2: dmmonitor_20241027225545.log
一直打印error 信息


Dmmonitor.ini确认监视器配置不对,需要更改配置
MON_INST_OGUID = 453331改成

2、主库节点配置的是
dmwatcher.ini配置中的453332" 检查项2:日志中 是否有error 信息

 

 

##sample 3  跨城异步HA 检查验证

 

场景 现象 结论 162 版本 测试结果2



场景1 "因为在ddds01、ddds02归档配置的ddds03实例是realtime,
需要全局类型守护,主备的都启动包括ddds03,才能open,
而ddds03里是异步归档的配置对应守护是本地类型,所以会导致重启后主库停留在mount状态。" "在最后一步切换为主库(汕尾dddS01),实时备库(汕尾dddS02),异步备库(珠海dddS03),时,替换后的dddS01 、dddS02所使用的dmarch.ini不正确,配置为dddS03是实时备库时的归档参数设置,而dddS03自身是配置了异步的归档,对应守护是本地类型,导致集群重启主库dddS01处于mount状态。

 

 

1)部署完成切换脚本后,必需检查好所有配置文件,拷贝出切换用到的所有配置文件交叉检查做评审。(如有条件可先进行切换测试)

2)脚本检查,与服务器内配置文件目录权限检查,脚本调用的sh文件里面的变量与路径对应的配置文件与权限是否都正确。


3)后续改进:目前已完成编写脚本工具,可自动生成切换所需的5种切换场景ini配置文件,校验测试无误后可在项目上使用,避免人工手动编辑时因配置文件较多出现纰漏情况;"



5)补充:达梦设计逻辑是1. dmwather日志级别高于dm 日志。 2.启动级别为 实时备库级别高于异步备库高于主库。

 

 

 

##### sample 4 日志中 常见函数的 状态解释
https://blog.csdn.net/OSlinux123/article/details/140668688

Clear all ep g_dw_status finished, Recovery finished!
switch sub_state to sub_stat_start!
设置GRP1守护进程为OPEN(SUB:STARTUP)状态
dm_connect_async connection 6 is in progress
非自动切换模式下20s没有收到远程守护进程消息
Local instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(DM01) 模式(PRIMARY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(8) FLSN(128083643) CLSN(1280836
Instance: 守护进程状态(ERROR) 实例状态(OK) 实例名(DM02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(8) FLSN(128042324) CLSN(128042680) S
dm_connect_async connection 6 is timeout
dm_connect_async connection 6 is in progress
dm_connect_async connection 6 is timeout
dw2_send_port_set from dmmonitor vio(6) set, mid(1673602727), from name:dmmonitor, ip:::ffff:192.168.12.125, mon_confirm:FALSE
dw2_send_port_set to dmwatcher vio(8) set, mid(-1), to name:DM02, ip:192.168.12.126
ohis_inst_info_copy_low, inst(DM02) apply info changed, old info[p_db_magic:1486960128, n_apply_ep:1], new info to set[p_db_magic:0, n_apply_ep:0
远程实例的模式、状态或者归档状态发生变化,原状态是:
Instance: 守护进程状态(ERROR) 实例状态(OK) 实例名(DM02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(8) FLSN(128042324) CLSN(128042680) S
远程实例的模式、状态或者归档状态发生变化,新状态是:
dw2_send_port_set from dmmonitor vio(10) set, mid(1673602730), from name:dmmonitor, ip:::ffff:192.168.12.125, mon_confirm:FALSE
远程实例的模式、状态或者归档状态发生变化,原状态是:
远程实例的模式、状态或者归档状态发生变化,新状态是:
Instance: 守护进程状态(STARTUP) 实例状态(OK) 实例名(DM02) 模式(UNKNOWN) 实例状态(SHUTDOWN) 归档状态(UNKNOWN) POCNT(0) FLSN(0) CLSN(0) SLSN(0) SSL
ohis_inst_info_copy_low, inst(DM02) apply info changed, old info[p_db_magic:0, n_apply_ep:0], new info to set[p_db_magic:1486960128, n_apply_ep:1
远程实例的模式、状态或者归档状态发生变化,原状态是:
Instance: 守护进程状态(STARTUP) 实例状态(OK) 实例名(DM02) 模式(UNKNOWN) 实例状态(SHUTDOWN) 归档状态(UNKNOWN) POCNT(0) FLSN(0) CLSN(0) SLSN(0) SSL
远程实例的模式、状态或者归档状态发生变化,新状态是:
Instance: 守护进程状态(UNIFY EP) 实例状态(OK) 实例名(DM02) 模式(STANDBY) 实例状态(MOUNT) 归档状态(UNKNOWN) POCNT(8) FLSN(128045793) CLSN(12804579
远程实例的模式、状态或者归档状态发生变化,原状态是:
Instance: 守护进程状态(UNIFY EP) 实例状态(OK) 实例名(DM02) 模式(STANDBY) 实例状态(MOUNT) 归档状态(UNKNOWN) POCNT(8) FLSN(128045793) CLSN(12804579
远程实例的模式、状态或者归档状态发生变化,新状态是:
Instance: 守护进程状态(STARTUP) 实例状态(OK) 实例名(DM02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(8) FLSN(128045793) CLSN(128045793)
远程实例的模式、状态或者归档状态发生变化,原状态是:
Instance: 守护进程状态(STARTUP) 实例状态(OK) 实例名(DM02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(8) FLSN(128045793) CLSN(128045793)
远程实例的模式、状态或者归档状态发生变化,新状态是:
Instance: 守护进程状态(OPEN) 实例状态(OK) 实例名(DM02) 模式(STANDBY) 实例状态(OPEN) 归档状态(UNKNOWN) POCNT(8) FLSN(128045793) CLSN(128045793) SL
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
switch sub_state to pre_set_dw_stat!
设置GRP1守护进程为RECOVERY(SUB:STARTUP)状态
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
dw2_notify_set_dw_stat, dseq = 1671462826, from_dw_stat: NONE, to_dw_stat: DW_RECOVERY
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
设置GRP1守护进程子状态为WAIT_SET_DW_STAT状态
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=217, dseq=1671462826, code=0
dw2_clear_ep_cmd_info_low, clear ep(DM01) cmd info, and reset curr_ep to NULL.
notify ep(DM01) set dw_stat to DW_RECOVERY success!
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
检测到实例(DM02)可恢复,执行恢复流程
开始向实例(DM02)发送归档日志
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
dw2_rarch_send to DM02[seqno: 0], dseq = 1671462827
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
设置GRP1守护进程子状态为WAIT_SEND_ARCH状态
[ohis_check_can_recover, p_iname:DM01, n_p_apply=0, p_apply_db_magic=1486960128, p_apply_seqno_arr=[1162045, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
[ohis_check_can_recover, s_iname:DM02, n_s_apply=1, s_apply_db_magic=1486960128, s_apply_seqno_arr=[1141458, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
dw2_rarch_send to DM02[seqno: 0], dseq = 1671462828
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
dw2_rarch_send to DM02[seqno: 0], dseq = 1671462829
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
dw2_rarch_send to DM02[seqno: 0], dseq = 1671462830
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
dw2_rarch_send to DM02[seqno: 0], dseq = 1671462831
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
dw2_rarch_send to DM02[seqno: 0], dseq = 1671462832
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
检测到实例(DM02)发送归档成功,设置为当前恢复实例
dw2_notify_sql_exec, dseq = 1671462833, sql: ALTER DATABASE SUSPEND
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
设置GRP1守护进程子状态为WAIT_TO_SUSPEND状态
向实例(DM02)发送归档日志成功,实例(DM01)转入suspend状态
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=1, dseq=1671462833, code=100
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=1, dseq=1671462833, code=0
dw2_clear_ep_cmd_info_with_recv_inst_low, clear ep(DM01) cmd info, and reset curr_ep to NULL.
转入suspend状态后,再次发送归档日志
dw2_rarch_send to DM02[seqno: 0], dseq = 1671462834
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
设置GRP1守护进程子状态为WAIT_SEND_ALL_ARCH状态
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=210, dseq=1671462834, code=100
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=210, dseq=1671462834, code=100
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=210, dseq=1671462834, code=100
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=210, dseq=1671462834, code=100
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=210, dseq=1671462834, code=100
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=210, dseq=1671462834, code=0
发送归档完毕,设置实例(DM02)归档有效
dw2_notify_chg_arch_status, dseq = 1671462835, rstat = 0
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
设置GRP1守护进程子状态为WAIT_SET_ARCH状态
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=100, dseq=1671462835, code=100
实例(DM02)归档状态发生变化:INVALID --> VALID
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=100, dseq=1671462835, code=0
dw2_clear_ep_cmd_info_with_recv_inst_low, clear ep(DM01) cmd info, and reset curr_ep to NULL.
设置实例(DM02)归档有效成功,通知实例(DM01)OPEN
dw2_notify_sql_exec, dseq = 1671462836, sql: ALTER DATABASE OPEN FORCE
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
设置GRP1守护进程子状态为WAIT_TO_OPEN状态
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=1, dseq=1671462836, code=0
dw2_clear_ep_cmd_info_with_recv_inst_low, clear ep(DM01) cmd info, and reset curr_ep to NULL.
dw2_set_recover_info, instance:DM02, recover flag:TRUE, from monitor:FALSE, last_recv_time:1673602836, recover retry time:60
本地守护进程为RECOVERY状态,本机实例为PRIMARY & OPEN,实例(DM02)故障恢复完成
将实例(DM02)从恢复列表中删除
不存在可恢复备库
dw2_clear_ep_cmd_info_low, clear ep(DM01) cmd info.
设置GRP1守护进程子状态为SUB_STATE_CLEAR状态
Clear all ep dw_stat value!
dw2_notify_set_dw_stat, dseq = 1671462837, from_dw_stat: DW_RECOVERY, to_dw_stat: NONE
Send tcp msg to local ep DM01, hpc_seqno:0, code:0
设置GRP1守护进程子状态为WAIT_CLEAR状态
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=217, dseq=1671462837, code=100
dw2_group_get_curr_ep_retcode, ep(DM01) cmd_ret:cmd=217, dseq=1671462837, code=0
dw2_clear_ep_cmd_info_low, clear ep(DM01) cmd info, and reset curr_ep to NULL.
notify ep(DM01) set dw_stat to NONE success!
dw2_clear_ep_cmd_info_low, clear ep(DM01) cmd info.
Clear all ep g_dw_status finished, Recovery finished!
switch sub_state to sub_stat_start!
设置GRP1守护进程为OPEN(SUB:STARTUP)状态

 

 

 

###sample 3 手工修改实时备库到异步备库


环境为 Linux 操作系统(CentOS/RedHat/Ubuntu 等) in dameng 2024版本


实例名 角色 同步方式
DMSVR1 主库 实时
DMSVR2 实时备库 实时
DMSVR3 实时备库 实时

目标结构:
实例名 角色 同步方式
DMSVR1 主库 实时
DMSVR2 实时备库 实时
DMSVR3 异步备库 异步


停止目标备库服务(DMSVR3)
修改 dm.ini 文件配置
修改 dmmal.ini 或 dmwatcher.ini 配置文件
启动备库并检查日志
在主库上验证异步同步状态
测试故障切换与同步机制


🔧 步骤详解
1. 登录到目标备库服务器(DMSVR3) (16 dm02节点将实时备修改为 异步备)

ssh dmdba@<DMSVR3_IP>
su - dmdba
2. 停止 DMSVR3 数据库实例
DmWatcherServiceswitchtest stop

 

vim dmarch.ini
修改以下关键参数:删除realtime同步章节,不删除的话,无法启动dmwatcher进程
[ARCHIVE_LOCAL1]
ARCH_TYPE = LOCAL
ARCH_DEST = /ddivs/archivelog
ARCH_FILE_SIZE = 2048
ARCH_SPACE_LIMIT = 84480


dmwatcher.ini 修改以下关键参数:
DW_TYPE = LOCAL
DW_MODE = MANUAL


5.修改DMSVR2,DMSVR1 的指向DMSVR3的归档参数

vim dmarch.ini
[ARCHIVE_ASYNC1]
ARCH_TYPE = ASYNC
ARCH_DEST = SWITCHTEST02
ARCH_TIMER_NAME=RT_TIMER

cat dmtimer.ini
[RT_TIMER] ##和dmarch.ini 中的arch_timer_name一致
TYPE=2
FREQ_MONTH_WEEK_INTERVAL=1
FREQ_SUB_INTERVAL=0
FREQ_MINUTE_INTERVAL=5 #发送归档间隔 单位分钟
START_TIME=00:00:00
END_TIME=00:00:00
DURING_START_DATE=2020-06-11 10:36:09
DURING_END_DATE=9999-12-31 23:59:59
NO_END_DATE_FLAG=1
DESCRIBE=RT_TIMER
IS_VALID=1

cat dm.ini
TIMER_INI = 1 #dmtimer.ini

DmWatcherServiceswitchtest stop
DmServiceswitchtest restart
DmWatcherServiceswitchtest start


6. 启动 DMSVR3 数据库实例

cd /opt/dmdbms/bin
DmServiceswitchtest restart
DmWatcherServiceswitchtest start


7. 检查日志确认启动状态
7.1).查看告警日志:db启动报错 日志可以看到,


cd /ddddk/dm/switchtest
tail -f /opt/dmdbms/log/dmalert_DMSVR3.log
观察是否出现错误信息。正常启动后应看到类似:
Archive [ARCHIVE_ASYNC2] is ASYNC type, but timer RT_TIMER not found, please check dmtimer.ini configuration and TIMER_INI in dm.ini.


7.2).DmWatcher启动 报错,日志无法看到报错信息,只能改成前端启动,才能看到具体报错
[dmdba@localhost ~]$ DmWatcherServiceswitchtest start
Starting DmWatcherServiceswitchtest: [ FAILED ]

前端启动,才能看到具体报错
/ddddk/dm/switchtest/bin/dmwatcher /dmdata/switchtest/data/switchtest/dmwatcher.ini
DMWATCHER[4.0] V8
Local dmwatcher's DW_TYPE is LOCAL in dmwatcher.ini, cannot configured REALTIME or TIMELY archive in dmarch.ini, you need modify one of them!
fail to read ini file

Instance started successfully.
Current mode: Standby (Async)


8. 在主库上验证同步状态
登录主库:

Bash
深色版本
cd /opt/dmdbms/bin
./disql sysdba/SYSDBA@localhost:5236
执行 SQL 查询同步状态:

-- 查看备库连接状态


✅ 验证与测试
1. 插入测试数据
在主库插入测试数据:

Sql
深色版本
INSERT INTO test_table VALUES(1, 'test');
COMMIT;
等待一段时间(根据网络延迟),在 DMSVR3 上查询是否同步成功:

Sql
深色版本
SELECT * FROM test_table WHERE id=1;
2. 故障切换测试(可选)
模拟主库宕机,观察 DMSVR2 是否能接管为主库,DMSVR3 是否继续异步同步。

📦 备份配置文件
建议将所有修改后的配置文件进行备份:

Bash
深色版本
cp /opt/dmdbms/data/DMSVR3/dm.ini /opt/dmdbms/data/DMSVR3/dm.ini.bak
cp /opt/dmdbms/data/DMSVR3/dmmal.ini /opt/dmdbms/data/DMSVR3/dmmal.ini.bak
cp /opt/dmdbms/data/DMSVR3/dmwatcher.ini /opt/dmdbms/data/DMSVR3/dmwatcher.ini.bak
📝 维护建议
定期监控异步备库的延迟时间。
使用 DM 提供的管理工具(如 Manager、Console)进行图形化监控。

📌 总结
通过上述步骤,你可以顺利地将一个实时备库改为异步备库,实现更灵活的灾备架构和资源利用。整个过程包括:

停止目标备库
修改配置文件(dm.ini, dmmal.ini, dmwatcher.ini)
重启并验证同步状态
测试数据同步与故障切换能力

 

posted @ 2025-07-08 15:46  feiyun8616  阅读(29)  评论(0)    收藏  举报