一、用keepalived进行脚本主从切换,当原主重新加入集群报错
[root@hlcc_slave data]# /usr/pgsql-13/bin/pg_rewind -D '/home/postgres/data' --source-server='host=172.28.17.142 port=5876 user=repmgr password=hl95repmgr dbname=repmgr connect_timeout=2' pg_rewind: error: cannot be executed by "root" You must run pg_rewind as the PostgreSQL superuser. [root@hlcc_slave data]# su - postgres Last login: Wed Aug 3 09:31:30 CST 2022 on pts/0 [postgres@hlcc_slave ~]$ /usr/pgsql-13/bin/pg_rewind -D '/home/postgres/data' --source-server='host=172.28.17.142 port=5876 user=repmgr password=hl95repmgr dbname=repmgr connect_timeout=2' pg_rewind: servers diverged at WAL location 0/140000A0 on timeline 9 pg_rewind: error: could not open file "/home/postgres/data/pg_wal/000000090000000000000014":
二、查看master以及slave节点数据库的data/pg_wal,均没有这个文件
(如果有,可以直接复制,再重新加入集群即可)
三、重新删除data目录,重新克隆主库,并注册从库
1、删除data目录
su - postgres mv data data.yyyymmdd mkdir data
2、克隆新主库
[postgres@hlcc_slave ~]$ repmgr -h hlcc_master -p 5876 -U repmgr -d repmgr -f /etc/repmgr/13/repmgr.conf standby clone -F NOTICE: destination directory "/home/postgres/data" provided INFO: connecting to source node DETAIL: connection string is: host=hlcc_master port=5876 user=repmgr dbname=repmgr DETAIL: current installation size is 158 MB NOTICE: checking for available walsenders on the source node (2 required) NOTICE: checking replication connections can be made to the source server (2 required) INFO: checking and correcting permissions on existing directory "/home/postgres/data" NOTICE: starting backup (using pg_basebackup)... HINT: this may take some time; consider using the -c/--fast-checkpoint option INFO: executing: /usr/pgsql-13/bin/pg_basebackup -l "repmgr base backup" -D /home/postgres/data -h hlcc_master -p 5876 -U repmgr -X stream -S repmgr_slot_2 NOTICE: standby clone (using pg_basebackup) complete NOTICE: you can now start your PostgreSQL server HINT: for example: /home/postgres/bin/postgresql start HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
3、启动当前数据库
[postgres@hlcc_slave home]$ postgresql start Starting PostgreSQL: OK [postgres@hlcc_slave home]$ postgresql status 0
4、注册为从库
[postgres@hlcc_slave home]$ repmgr standby register -f /etc/repmgr/13/repmgr.conf --force INFO: connecting to local node "hlcc_slave" (ID: 2) INFO: connecting to primary database INFO: standby registration complete NOTICE: standby node "hlcc_slave" (ID: 2) successfully registered
5、查看主从状态
[postgres@hlcc_slave home]$ repmgr -f /etc/repmgr/13/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string ----+-------------+---------+-----------+-------------+----------+----------+----------+---------------------------------------------------------------------------------------------- 1 | hlcc_master | primary | * running | | default | 100 | 11 | host=172.28.17.142 port=5876 user=repmgr password=hl95repmgr dbname=repmgr connect_timeout=2 2 | hlcc_slave | standby | running | hlcc_master | default | 100 | 11 | host=172.28.17.143 port=5876 user=repmgr password=hl95repmgr dbname=repmgr connect_timeout=2 [postgres@hlcc_slave home]$