KingbaseES V8R6集群运维案例之---sys_backup.sh init失败‘pg_replslot’故障

案例说明:
一主一备集群在执行sys_backup.sh init时出现“link ‘pg_replslot’ destionation ....." 错误,备份失败。故障如下图所示:

适用版本:
KingbaseES V8R6

一、问题分析

查看data目录文件信息,如下所示有pg_replslot的软链接:

正常的data下的文件信息,如下图所示:

二、案例复现
1、创建pg_replslot的链接

# 创建pg_replslot的链接
[kingbase@node1 data]$ mkdir sys_replslot
[kingbase@node1 data]$ ln -s sys_replslot pg_replslot
[kingbase@node1 data]$ ls -lh
total 88K
drwx------ 9 kingbase kingbase   86 Oct 13 14:41 base
-rw------- 1 kingbase kingbase   46 Nov 20 17:17 current_logfiles
-rw-rw-r-- 1 kingbase kingbase  933 Oct 27 10:49 es_rep.conf
drwx------ 2 kingbase kingbase 4.0K Nov 20 17:18 global
........
lrwxrwxrwx 1 kingbase kingbase   12 Nov 20 17:54 pg_replslot -> sys_replslot

# pg_replslot存储信息
[kingbase@node1 data]$ ls -lh pg_replslot/
total 0
[kingbase@node1 data]$ cp -r sys_replslot.bk/repmgr_slot_2 pg_replslot/

[kingbase@node1 data]$ ls -lh pg_replslot/
total 0
drwx------ 2 kingbase kingbase 18 Nov 20 17:54 repmgr_slot_2

2、执行备份初始化

[kingbase@node1 bin]$  ./sys_backup.sh init
# pre-condition: check the non-archived WAL files
# generate local sys_rman.conf...DONE
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...DONE
# create stanza and check...(maybe 60+ seconds)
# create stanza and check...DONE
# initial first full backup...(maybe several minutes)
ERROR: full backup failed, check log file /home/kingbase/cluster/R6C8/HAC8/kingbase/log/sys_rman_backup.log

# 备份日志:
2023-11-20 17:55:48.562 P00   INFO: backup command begin 2.27: --archive-copy --no-archive-statistics --archive-timeout=600 --band-width=0 --cmd-ssh=/home/kingbase/cluster/R6C8/HAC8/kingbase/bin/sys_securecmd --compress-level=3 --compress-type=none --config=/home/kingbase/kbbr_repo/sys_rman.conf --exec-id=26829-8cc6447c --kb2-host=192.168.1.202 --kb2-host-user=kingbase --kb1-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/data --kb2-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/data --kb1-port=54321 --kb2-port=54321 --kb1-user=esrep --kb2-user=esrep --log-level-console=info --log-level-file=info --log-path=/home/kingbase/cluster/R6C8/HAC8/kingbase/log --log-subprocess --non-archived-space=1024 --process-max=4 --repo1-path=/home/kingbase/kbbr_repo --repo1-retention-full=5 --stanza=kingbase --start-fast --type=full
WARN: set process-max 4 is too large, auto set to CPU core count 1
2023-11-20 17:55:49.298 P00   INFO: Get pageCheckSum flag from ControlFile is 1
2023-11-20 17:55:49.401 P00   INFO: Check the non archvied WAL space under the setting 1024 MB
2023-11-20 17:55:49.401 P00   INFO: Non archived WAL files have 0 MB.

2023-11-20 17:55:49.401 P00   INFO: execute non-exclusive sys_start_backup(): backup begins after the requested immediate checkpoint completes
2023-11-20 17:55:49.725 P00   INFO: backup start archive = 000000050000000000000034, lsn = 0/34000028
2023-11-20 17:55:49.725 P00   INFO: check archive for prior segment 000000050000000000000033
ERROR: [070]: link 'pg_replslot' destination '/home/kingbase/cluster/R6C8/HAC8/kingbase/data/sys_replslot' is in KBDATA
2023-11-20 17:55:49.928 P00   INFO: backup command end: aborted with exception [070]

如下图所示,故障复现:

三、问题解决
将软连接取消,直接使用sys_replslot目录后,备份正常。对于生产环境,注意查明软链接创建的原因,确定不影响数据库运行后,取消软链接。

posted @ 2023-11-22 11:17  天涯客1224  阅读(42)  评论(0)    收藏  举报