PG高可用之Keepalived安装示例及注意事项
适用范围
pg12+
方案概述
通过Keepalived 实现 PG HA的自主切换,以达到高可用的目的,由于Keepalived 是在PG流复制之上,所以在安装Keepalived 之前,我们需要先搭建好PG流复制,这里我们使用PG12的版本,如果是用PG12以前的版本,需要修改failover.sh的切换内容。
实施步骤
1. 资源规划:

2. 安装准备:
安装前我们需要创建数据库Keepalived,并且创建表探测表sr_delay,后续Keepalived探测,刷新sr_delay表的last_alive字段为当前探测时间。
这张表用来判断主备延迟情况,数据库故障切换时会查询用到这张表。
postgres=# create user keepalived password 'keepalived' CONNECTION LIMIT 4 ; postgres=# create database keepalived owner keepalived; postgres=# \c keepalived keepalived keepalived=> create table sr_delay(id int4, last_alive timestamp(0) without time zone); #表sr_delay只允许写入一条记录,并且不允许删除此表数据,通过触发器实现。创建触发器函数,如下所示: CREATE FUNCTION cannt_delete () RETURNS trigger LANGUAGE plpgsql AS $$ BEGIN RAISE EXCEPTION 'Table sr_delay can not delete !'; END; $$; #创建触发器: CREATE TRIGGER trigger_sr_delay_del BEFORE DELETE ON sr_delay FOR EACH ROW EXECUTE PROCEDURE cannt_delete() ; CREATE TRIGGER trigger_sr_delay_tru BEFORE TRUNCATE ON sr_delay FOR STATEMENT EXECUTE PROCEDURE cannt_delete() ; #插入数据: INSERT INTO sr_delay VALUES(1,now()) ;
后续Keepalived会每隔指定时间探测PostgreSQL数据库存活,并且以Keepalived用户登录Keepalived数据库刷新这张表, 配置主备库pg_hba.conf,增加如下内容:
echo " host keepalived keepalived 10.85.10.0/24 md5 host keepalived keepalived 10.85.10.0/24 md5" >> pg_hba.conf
3.安装keepalived
#两个节点 $ wget https://www.keepalived.org/software/keepalived-2.2.7.tar.gz tar xf keepalived-2.2.7.tar.gz cd keepalived-2.2.7 ./configure --prefix=/usr/local/keepalived/ # ./configure 或 #./configure --prefix=/postgres/keepalived-2.2.7 make make install
3.1 配置 keepalived.conf
vi /etc/keepalived/keepalived.conf ! Configuration File for keepalived global_defs { # 邮件通知信息 notification_email { # 定义收件人 446464087@qq.com } # 定义发件人 notification_email_from test@cqdba.cn # SMTP服务器地址 smtp_server 10.85.10.51 smtp_connect_timeout 30 # 路由器标识,一般不用改,也可以写成每个主机自己的主机名 router_id huyidb03 vrrp_skip_check_adv_addr vrrp_strict vrrp_garp_interval 0 vrrp_gna_interval 0 } # 定义用于实例执行的脚本内容,比如可以在线降低优先级,用于强制切换 vrrp_script check_pg_alived { script "/etc/keepalived/scirpts/check_pg.sh" interval 5 fall 3 # require 3 failures for KO } # 一个vrrp_instance就是定义一个虚拟路由器的,实例名称 vrrp_instance VI_1 { # 定义初始状态,可以是MASTER或者BACKUP state MASTER #非抢占模式 nopreempt # 工作接口,通告选举使用哪个接口进行 interface ens38 # 虚拟路由ID,如果是一组虚拟路由就定义一个ID,如果是多组就要定义多个,而且这个虚拟 # ID还是虚拟MAC最后一段地址的信息,取值范围0-255 virtual_router_id 51 #权重 如果你上面定义了MASTER,这里的优先级就需要定义的比其他的高 priority 100 #通告频率 单位s advert_int 1 #通信认证机制,这里是明文认证还有一种是加密认证 authentication { auth_type PASS auth_pass abcdefgh } # 设置虚拟VIP地址 virtual_ipaddress { 10.85.10.77 } # 追踪脚本,通常用于去执行上面的vrrp_script定义的脚本内容 track_script { check_pg_alived } # 如果主机状态变成Master|Backup|Fault之后会去执行的通知脚本,脚本要自己写 smtp_alert notify_master "/etc/keepalived/scirpts/failover.sh" notify_fault "/etc/keepalived/scirpts/fault.sh" } }
以上是Keepalived主节点的配置,Keepalived备节点的priority参数改成90 ,state改为BACKUP, 其余参数配置一样。
3.2 配置check_pg.sh
vi /etc/keepalived/scirpts/check_pg.sh #!/bin/bash # 配置环境变量 pgport=5432 pguser=keepalived pgdb=keepalived pgpwd='keepalived' LANG=en_US.utf8 PGHOME=/postgresql/pg12 LD_LIBRARY_PATH=$PGHOME/lib:/lib64:/usr/lib64:/usr/local/lib64:/lib:/usr/lib:/usr/local/lib PATH=$PGHOME/bin:$PATH:. MONITOR_LOG="/etc/keepalived/log/pg_monitor.log" SQL1="UPDATE sr_delay SET last_alive= now();" SQL2='SELECT 1;' keeplognums=30000 #此脚本不检查备库存活状态,如果是备库则退出 standby_flg=`psql -p$pgport -Upostgres -At -c "SELECT pg_is_in_recovery();"` if [ ${standby_flg} == 't' ]; then echo -e "`date +%F\ %T`:This is a standby database, exit!\n" > $MONITOR_LOG exit 0 fi export PGPASSWORD=$pgpwd #主库更新sr_delay 表 echo $SQL1 | psql -At -p $pgport -U $pguser -d $pgdb >> $MONITOR_LOG #判断主库是否可用 echo $SQL2 | psql -At -p $pgport -U $pguser -d $pgdb if [ $? -eq 0 ] ; then echo -e "`date +%F\ %T`:Primary db is health." >> $MONITOR_LOG exit 0 else echo -e "`date +%F\ %T`:Attention: Primary db is not health!" >> $MONITOR_LOG exit 1 fi #日志保留 keeplognums 行 if [ ! -f ${MONITOR_LOG} ] ;then touch ${MONITOR_LOG};fi lognums=`cat ${MONITOR_LOG} |wc -l` catnum=$((${lognums} -${keeplognums})) if [[ $lognums -gt ${keeplognums} ]] ; then sed -i "1,${catnum}d" ${MONITOR_LOG}; fi
3.3 配置failover.sh
vi /etc/keepalived/scirpts/failover.sh #!/bin/bash export PGPORT=5432 export PGUSER=keepalived export PG_OS_USER=postgres export PGDBNAME=keepalived export LANG=en_US.utf8 export PGPATH=/postgresql/pg12 export PGDATA=/postgresql/data export PATH=$PATH:$PGPATH/bin LOGFILE='/etc/keepalived/log/failover.log' # 主备数据库同步时延,单位为秒 sr_allowed_delay_time=100 SQL1='select pg_is_in_recovery from pg_is_in_recovery();' SQL2="select count(1) as delay_time from sr_delay where now()<(last_alive + interval '${sr_allowed_delay_time} seconds');" #SQL2="select count(1) as delay_time from sr_delay where now()<(last_alive + interval '100 seconds');" sleep $sr_allowed_delay_time db_role=`echo $SQL1 | psql -At -p $PGPORT -U $PGUSER -d $PGDBNAME -w` db_sr_delaytime=`echo $SQL2 | psql -p $PGPORT -d $PGDBNAME -U $PGUSER -At -w` SWITCH_COMMAND='pg_ctl promote -D $PGDATA' # 如果为备库,且延迟大于指定时间则切换为主库 # 判断 是否为备库,为主库退出 if [ ${db_role} == 'f' ]; then echo -e `date +"%F %T"` "Attention:The current postgreSQL DB is master database,cannot switched!" >> $LOGFILE exit 0 fi # 判断延迟 时间, 小于延迟时间退出 if [ $db_sr_delaytime -gt 0 ]; then echo -e `date +"%F %T"` "Attention:The current master database is health,the standby DB cannot switched!" >> $LOGFILE exit 0 fi #大于延迟 时间切换 if [ !$db_sr_delaytime ]; then echo -e `date +"%F %T"` "Attention:The current database is statndby,ready to switch master database!" >> $LOGFILE su - $PG_OS_USER -c "$SWITCH_COMMAND" elif [ $? -eq 0 ]; then echo -e `date +"%F %T"` "success:The current standby database successed to switched the primary PG database !" >> $LOGFILE exit 0 else echo -e `date +"%F %T"` "Error: the standby database failed to switch the primary PG database !,pelease checked it!" >> $LOGFILE exit 1 fi
3.4 配置fault.sh
vi /etc/keepalived/scirpts/fault.sh #!/bin/bash LOGFILE=/etc/keepalived/log/pg_db_fault.log PGDATA=/postgresql/data PGPORT=5432 echo -e `date +"%F %T"` "Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database!" >> $LOGFILE PGPID="`head -n1 $PGDATA/postmaster.pid`" systemctl stop keepalived kill -9 $PGPID if [ $? -eq 0 ] ; then echo -e `date +"%F %T"` "Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database!" >> $LOGFILE systemctl stop keepalived exit 1 fi
4.启动keepalived
chmod -R 755 /etc/keepalived/scirpts/* systemctl daemon-reload systemctl enable keepalived systemctl start keepalived #状态查询 : [root@huyidb03 ~]# systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled) Active: active (running) since Sat 2022-06-11 11:49:14 CST; 5s ago Docs: man:keepalived(8) man:keepalived.conf(5) man:genhash(1) https://keepalived.org Process: 97742 ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 97743 (keepalived) CGroup: /system.slice/keepalived.service ├─97612 sleep 100 ├─97743 /usr/local/sbin/keepalived -D ├─97744 /usr/local/sbin/keepalived -D ├─97757 /bin/bash /etc/keepalived/scirpts/failover.sh └─97758 sleep 100 Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: (VI_1) Receive advertisement timeout Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: (VI_1) Entering MASTER STATE # 主库上 MASTER 角色 Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: (VI_1) setting VIPs. Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: (VI_1) Sending/queueing gratuitous ARPs on ens38 for 10.85.10.77 # 启动了 vip Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 11:49:18 huyidb03 Keepalived_vrrp[97744]: Sending gratuitous ARP on ens38 for 10.85.10.77
5.切换演示
5.1 主库日志
#主库停库 postgres@huyidb03-> pg_ctl stop waiting for server to shut down....2022-06-11 12:03:43.219 CST [98168] LOG: received fast shutdown request 2022-06-11 12:03:43.220 CST [98168] LOG: aborting any active transactions 2022-06-11 12:03:43.221 CST [98168] LOG: background worker "logical replication launcher" (PID 98175) exited with exit code 1 2022-06-11 12:03:43.221 CST [98170] LOG: shutting down 2022-06-11 12:03:43.239 CST [98168] LOG: database system is shut down done server stopped # 查看 keepalived [root@huyidb03 ~]# systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled) Active: inactive (dead) since Sat 2022-06-11 12:03:58 CST; 6s ago Docs: man:keepalived(8) man:keepalived.conf(5) man:genhash(1) https://keepalived.org Process: 98183 ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 98184 (code=exited, status=0/SUCCESS) CGroup: /system.slice/keepalived.service Jun 11 11:55:56 huyidb03 Keepalived_vrrp[98185]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 11:55:56 huyidb03 Keepalived_vrrp[98185]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 12:03:47 huyidb03 Keepalived_vrrp[98185]: Script `check_pg_alived` now returning 1 Jun 11 12:03:57 huyidb03 Keepalived_vrrp[98185]: VRRP_Script(check_pg_alived) failed (exited with status 1) Jun 11 12:03:57 huyidb03 Keepalived_vrrp[98185]: (VI_1) Entering FAULT STATE Jun 11 12:03:57 huyidb03 Keepalived_vrrp[98185]: (VI_1) sent 0 priority Jun 11 12:03:57 huyidb03 Keepalived_vrrp[98185]: (VI_1) removing VIPs. Jun 11 12:03:57 huyidb03 Keepalived[98184]: Stopping Jun 11 12:03:57 huyidb03 systemd[1]: Stopping LVS and VRRP High Availability Monitor... Jun 11 12:03:58 huyidb03 systemd[1]: Stopped LVS and VRRP High Availability Monitor. #查看日志 tail -100f /etc/keepalived/log/pg_monitor.log 2022-06-11 12:03:42: Primary db is health. 2022-06-11 12:03:47: Attention: Primary db is not health! 2022-06-11 12:03:52: Attention: Primary db is not health! 2022-06-11 12:03:57: Attention: Primary db is not health! [root@huyidb03 log]# tail -100f /etc/keepalived/log/pg_db_fault.log 2022-06-11 11:52:59 Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database! 2022-06-11 12:03:57 Error:Because of the priamry DB happend some unknown problem,So turn off the PostgreSQL Database!
5.2 备库日志
tail -100f /etc/keepalived/log/pg_monitor.log [root@huyidb04 log]# cat failover.log 2022-06-11 12:05:38 Attention:The current database is statndby,ready to switch master database! [root@huyidb04 log]# tail -100f pg_monitor.log 2022-06-11 12:04:38: This is a standby database, exit!\n UPDATE 1 2022-06-11 12:05:43: Primary db is health. UPDATE 1 2022-06-11 12:05:48: Primary db is health. UPDATE 1 [root@huyidb04 ~]# systemctl status keepalived ● keepalived.service - LVS and VRRP High Availability Monitor Loaded: loaded (/usr/lib/systemd/system/keepalived.service; disabled; vendor preset: disabled) Active: active (running) since Sat 2022-06-11 12:01:38 CST; 9min ago Docs: man:keepalived(8) man:keepalived.conf(5) man:genhash(1) https://keepalived.org Process: 79603 ExecStart=/usr/local/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS) Main PID: 79604 (keepalived) CGroup: /system.slice/keepalived.service ├─79604 /usr/local/sbin/keepalived -D └─79605 /usr/local/sbin/keepalived -D Jun 11 12:03:58 huyidb04 Keepalived_vrrp[79605]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 12:03:58 huyidb04 Keepalived_vrrp[79605]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 12:04:01 huyidb04 Keepalived_vrrp[79605]: smtp fd 14 returned write error Jun 11 12:04:03 huyidb04 Keepalived_vrrp[79605]: (VI_1) Sending/queueing gratuitous ARPs on ens38 for 10.85.10.77 Jun 11 12:04:03 huyidb04 Keepalived_vrrp[79605]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 12:04:03 huyidb04 Keepalived_vrrp[79605]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 12:04:03 huyidb04 Keepalived_vrrp[79605]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 12:04:03 huyidb04 Keepalived_vrrp[79605]: Sending gratuitous ARP on ens38 for 10.85.10.77 Jun 11 12:04:03 huyidb04 Keepalived_vrrp[79605]: Sending gratuitous ARP on ens38 for 10.85.10.77
6. keepalived使用注意事项
- 启动keepalived 前, 一定要先启动PG, 且同步状态正常
- 先库动PG主库的keepalived ,在启动PG备库的keepalive
- keepalive master 机 down掉, VIP会自动切换到 backup 上, 即使主数据库正常 , 这时keepalive 也会将备库切换为新的主库 (如果不是用VIP连接数据库的,就会出现双主,数据不一致的情况)
- 正常维护的时候 , 先停备库的keepalived ; 再停主库的 keepalived 。 然后在停数据库
- 避免开机启动keepalived ,当PG还在启动中时,不确认谁是主时,keepaliveed 谁先启动,可能会先成为MSTER 状态。VIP 将在MASTER 上启动.
- 数据库DOWN掉时,keepalived 发现没有进程, 也会DOWN掉
当keepalived 自己 down掉,在另一台机器上切换为master 后, 数据库不会立即切换为主, 时间由 failover.sh 脚本中的,sr_allowed_delay_time 参数控制 - keepalived 只能 ,建议和DB 安装在同一台机器, 原因是在切换的时候需要技持OS 脚本,如:pg_ctl promote -D $PGDATA ,如果在不同机器,需要编写脚本,能远程连过来执行
- 如何回切呢?-- 原主库启动后,先配置与新从库之间的同步,数据达到一致后,再找个时间进行切换
参考文档


浙公网安备 33010602011771号