postgresql 高可用 repmgr 的使用之九 1 Primary + 2 Standby 的 auto failover

os:ubunbu 16.04
postgresql:9.6.8
repmgr:4.1.1

192.168.56.101 node1
192.168.56.102 node2
192.168.56.103 node3

配置好 1 Primary + 2 Standby

详细过程略,参考前面的blog。

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
 1  | node1 | primary | * running |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
 3  | node3 | standby |   running | node1    | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2
 

手动关闭node1主库模拟异常

node1 上操作

$ sudo pg_ctlcluster 9.6 main stop

node2 上查看

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
 1  | node1 | primary | - failed  |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
 3  | node3 | standby |   running | node2    | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node1" (ID: 1), following error encountered :
"could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?"
	

可以看到 node2 上的 postgresql 已经提升为新的master。
且 node3 的 postgresql 的 upstream 已经由之前的node1调整为 node2 了。

node3 上查看

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
 1  | node1 | primary | - failed  |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
 3  | node3 | standby |   running | node2    | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node1" (ID: 1), following error encountered :
"could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?"
	

node2虚拟机掉电

此时,node2 上postgresql 为新的master,继续测试ha,把node2虚拟机掉电。

node3 上查看

$ repmgr -f /etc/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location   | Connection string                                            
----+-------+---------+-----------+----------+------------+-----------------------------------------------------------------
 1  | node1 | primary | - failed  |          | location01 | host=192.168.56.101 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | - failed  |          | location01 | host=192.168.56.102 user=repmgr dbname=repmgr connect_timeout=2
 3  | node3 | primary | * running |          | location01 | host=192.168.56.103 user=repmgr dbname=repmgr connect_timeout=2

WARNING: following issues were detected
  - when attempting to connect to node "node1" (ID: 1), following error encountered :
"could not connect to server: Connection refused
	Is the server running on host "192.168.56.101" and accepting
	TCP/IP connections on port 5432?"
  - when attempting to connect to node "node2" (ID: 2), following error encountered :
"timeout expired"
$ tail -f /var/log/postgresql/repmgrd.log
[2018-09-26 10:54:59] [INFO] node "node3" (node ID: 3) monitoring upstream node "node2" (node ID: 2) in normal state
[2018-09-26 10:54:59] [DETAIL] last monitoring statistics update was 5 seconds ago
[2018-09-26 10:55:11] [WARNING] unable to connect to upstream node "node2" (node ID: 2)
[2018-09-26 10:55:11] [INFO] checking state of node 2, 1 of 10 attempts
[2018-09-26 10:55:13] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:55:18] [INFO] checking state of node 2, 2 of 10 attempts
[2018-09-26 10:55:20] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:55:25] [INFO] checking state of node 2, 3 of 10 attempts
[2018-09-26 10:55:27] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:55:32] [INFO] checking state of node 2, 4 of 10 attempts
[2018-09-26 10:55:34] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:55:39] [INFO] checking state of node 2, 5 of 10 attempts
[2018-09-26 10:55:41] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:55:46] [INFO] checking state of node 2, 6 of 10 attempts
[2018-09-26 10:55:48] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:55:53] [INFO] checking state of node 2, 7 of 10 attempts
[2018-09-26 10:55:55] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:56:00] [INFO] checking state of node 2, 8 of 10 attempts
[2018-09-26 10:56:02] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:56:07] [INFO] checking state of node 2, 9 of 10 attempts
[2018-09-26 10:56:09] [INFO] sleeping 5 seconds until next reconnection attempt
[2018-09-26 10:56:14] [INFO] checking state of node 2, 10 of 10 attempts
[2018-09-26 10:56:16] [WARNING] unable to reconnect to node 2 after 10 attempts
[2018-09-26 10:56:16] [NOTICE] this node is the only available candidate and will now promote itself
[2018-09-26 10:56:16] [INFO] promote_command is:
  "/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file"
[2018-09-26 10:56:16] [NOTICE] redirecting logging output to "/var/log/postgresql/repmgrd.log"

[2018-09-26 10:56:18] [NOTICE] promoting standby to primary
[2018-09-26 10:56:18] [DETAIL] promoting server "node3" (ID: 3) using "sudo pg_ctlcluster 9.6 main promote"
[2018-09-26 10:56:18] [DETAIL] waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
[2018-09-26 10:56:19] [NOTICE] STANDBY PROMOTE successful
[2018-09-26 10:56:19] [DETAIL] server "node3" (ID: 3) was successfully promoted to primary
[2018-09-26 10:56:19] [INFO] switching to primary monitoring mode
[2018-09-26 10:56:19] [NOTICE] monitoring cluster primary "node3" (node ID: 3)
[2018-09-26 10:56:29] [INFO] monitoring primary node "node3" (node ID: 3) in normal state
[2018-09-26 10:56:39] [INFO] monitoring primary node "node3" (node ID: 3) in normal state

1 Primary + 2 Standby 的 autofailover 和 1 Primary + 1 Standby 的 autofailover 基本一致,只是多了一个 standby,就多了一点ha。

posted @ 2018-09-26 11:05  peiybpeiyb  阅读(529)  评论(0编辑  收藏  举报