PostgreSQL集群运维案例之---创建流复制集群

案例说明:
通过手工方式创建流复制集群,详细记录流复制创建过程。

集群节点信息:

[root@node101 ~]# cat /etc/hosts
192.168.1.101   node101    ;primary节点
192.168.1.102   node102    ; standby节点

数据库版本:

postgres=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)

PostgreSQL主备流复制原理:

流复制的3个主要进程:

  • 主库的walsender进程:负责发送WAL日志给备库。
  • 备库的walreceiver进程:负责主库发送的WAL日志。
  • 备库的startup进程:重放接收的WAL日志。

主从之间是怎么通信的?

(1)primary端 后端进程通过执行函数XLogInsert()和XLogFlush(),将WAL数据写入并刷新到WAL段文件中。
(2)primary端 WAL发送器进程将写入WAL段文件的WAL数据发送到WAL接收器进程。
(3)primary端 在发送WAL数据之后,后端进程继续等待来自备库的ACK响应。更确切地说,后端进程通过执行内部函数SyncRepWaitForLSN()来获取锁存器,并等待它被释放。
(4)standby端 上的WAL接收器通过write()系统函数调用,将接收到的WAL数据写入备库的WAL段,并向WAL发送器返回ACK响应。
(5)standby端 WAL接收器通过调用(如fsync()函数)将WAL数据刷新到WAL段中,向WAL发送器返回另一个ACK响应,并通知启动进程相关WAL数据已更新。
(6)standby端 启动进程重放已写入WAL段的WAL数据。
(7)primary端 WAL发送器在收到来自WAL接收器的ACK响应后释放后端进程的锁存器,然后后端进程完成commit或abort动作。锁存器释放的时间取决于参数synchronous_commit。如果它是‘on’(默认),那么当接收到步骤(5)的ACK时,锁存器被释放;如果它是'remote_write',当接收到步骤(4)的ACK时,其被释放。

每个ACK响应将备库的内部信息通知给主库,包含以下4个项目:

  • 已写入最新WAL数据的LSN位置。
  • 已刷新最新WAL数据的LSN位置。
  • 启动进程已经重放最新的WAL数据的LSN。
  • 发送此响应的时间戳。

一、部署postgresql数据库软件

Tips:
在主库节点,安装数据库软件并创建数据库,备库只需要安装数据库软件即可。

1、下载安装包
wget https://repo.huaweicloud.com/postgresql/source/v14.2/postgresql-14.2.tar.gz

2、部署数据库软件(主备库节点)

1)创建postgres系统用户

[root@node101 ~]# id postgres
uid=26(postgres) gid=26(postgres) groups=26(postgres)

2)源码安装数据库软件(主备库)

[root@node102 postgresql-14.2]# ./configure --prefix=/usr/local/pg14
[root@node102 postgresql-14.2]# make;make install

3)主库创建数据库
[postgres@node101 bin]$./initdb -U postgre -W -E utf8 -D /data/pg14/data

4)连接数据库访问

[postgres@node101 bin]$ ./psql -U postgre postgres
psql (14.2)
Type "help" for help.

postgres=# select version();
                                                 version
---------------------------------------------------------------------------------------------------------
 PostgreSQL 14.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit
(1 row)

二、创建和配置流复制集群

1、配置主库归档

# 创建归档目录(主备库)
[postgres@node101 bin]$ mkdir -p /data/pg14/arch

# 配置主库归档
[postgres@node101 ~]$ cat /data/pg14/data/postgresql.conf |grep archive_
archive_mode = on               # enables archiving; off, on, or always
archive_command = 'test ! -f /data/pg14/arch/%f && cp %p /data/pg14/arch/%f'  

注意:
archive_mode = on 默认备库不执行归档,需要配置archive_mode = always,备库才执行
归档。

# 查看归档配置
postgres=# show archive_mode;
 archive_mode
--------------
 on
(1 row)

postgres=# show archive_command;
                     archive_command
----------------------------------------------------------
 test ! -f /data/pg14/arch/%f && cp %p /data/pg14/arch/%f
(1 row)

# 执行归档切换
postgres=# select pg_switch_wal();
 pg_switch_wal
---------------
 0/16FFEF0
(1 row)

[postgres@node101 bin]$ ls -lh /data/pg14/arch/
total 16M
-rw------- 1 postgres postgres 16M May 16 17:36 000000010000000000000001

2、wal日志及流复制相关配置

[postgres@node101 ~]$ cat /data/pg14/data/postgresql.conf |grep wal_level
wal_level = replica                     # minimal, replica, or logical

[postgres@node101 ~]$ cat /data/pg14/data/postgresql.conf |grep wal_sender
max_wal_senders = 10            # max number of walsender processes

[postgres@node101 ~]$ cat /data/pg14/data/postgresql.conf |grep slot
max_replication_slots = 10      # max number of replication slots

3、数据库远程连接配置

Tips:
默认postgresql启动后,只监听127.0.0.1的数据库服务连接,如果配置流复制,需要允许客户端远程访问数据库。

1)配置listener参数

[postgres@node101 bin]$ cat /data/pg14/data/postgresql.conf |grep -i listen
listen_addresses = '*'          # what IP address(es) to listen on;

[postgres@node101 bin]$ netstat -antlp |grep 5432
tcp        0      0 0.0.0.0:5432            0.0.0.0:*               LISTEN      20714/postgres

2)配置sys_hba.conf允许远程连接数据库

[postgres@node101 bin]$ cat /data/pg14/data/pg_hba.conf

# TYPE  DATABASE        USER            ADDRESS                 METHOD

# "local" is for Unix domain socket connections only
local   all             all                                     trust
# IPv4 local connections:
host    all             all             127.0.0.1/32            trust
host    all             all             0.0.0.0/0                 trust
# IPv6 local connections:
host    all             all             ::1/128                 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local   replication     all                                     trust
host    replication     all             127.0.0.1/32            trust
host    replication     all             ::1/128                 trust
host    replication     all             0.0.0.0/0                 trust

4、通过pg_basebackup克隆备库

[postgres@node102 bin]$ ./pg_basebackup -h 192.168.1.101 -U postgre -W -Fp -X stream -v -P -D /data/pg14/data
Password:
    
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/5000028 on timeline 1
pg_basebackup: starting background WAL receiver
pg_basebackup: created temporary replication slot "pg_basebackup_21796"
34789/34789 kB (100%), 1/1 tablespace
pg_basebackup: write-ahead log end point: 0/5000138
pg_basebackup: waiting for background process to finish streaming ...
pg_basebackup: syncing data to disk ...
pg_basebackup: renaming backup_manifest.tmp to backup_manifest
pg_basebackup: base backup completed

5、配置备库标识文件standby.signal

[postgres@node102 bin]$ cat /data/pg14/data/standby.signal
standby_mode = on

6、配置备库到主库连接参数

[postgres@node102 data]$ tail /data/pg14/data/postgresql.conf
primary_conninfo = 'application_name=node102 host=192.168.1.101 port=5432 user=postgre password=123456'
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on

# 或者将配置写入postgresql.auto.conf

[postgres@node102 bin]$ cat /data/pg14/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'application_name=node102 host=192.168.1.101 port=5432 user=postgre password=123456'
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on

三、流复制集群验证
1、启动备库数据库服务

[postgres@node102 bin]$ chmod 700 /data/pg14/data
[postgres@node102 bin]$ ./pg_ctl start -D /data/pg14/data/
........
2023-05-16 18:12:33.130 CST [20004] LOG:  entering standby mode
2023-05-16 18:12:33.132 CST [20004] LOG:  redo starts at 0/5000028
2023-05-16 18:12:33.132 CST [20004] LOG:  consistent recovery state reached at 0/6000000
2023-05-16 18:12:33.132 CST [20003] LOG:  database system is ready to accept read-only connections
2023-05-16 18:12:33.137 CST [20008] LOG:  started streaming WAL from primary at 0/6000000 on timeline 1
 done
server started.

# 查看备库数据库进程
[postgres@node102 bin]$ ps -ef |grep postgre
postgres 17670     1  0 18:08 ?        00:00:00 /usr/local/pg14/bin/postgres -D /data/pg14/data
postgres 17671 17670  0 18:08 ?        00:00:00 postgres: startup waiting for 000000010000000000000006
postgres 17672 17670  0 18:08 ?        00:00:00 postgres: checkpointer
postgres 17673 17670  0 18:08 ?        00:00:00 postgres: background writer
postgres 17674 17670  0 18:08 ?        00:00:00 postgres: stats collector
kingbase 21005     1  0 14:46 ?        00:00:00 /home/kingbase/cluster/R6HA/ha7/kingbase/kingbase/bin/../share/postgres_exporter

2、查看主库流复制

postgres=# select * from pg_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend
_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn |    write_lag    |
  flush_lag    |   replay_lag    | sync_priority | sync_state |          reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+-----------+-----------+-----------+------------+-----------------+--
---------------+-----------------+---------------+------------+-------------------------------
 23927 |       10 | postgre | node102          | 192.168.1.102 |                 |       59303 | 2023-05-16 18:1
2:34.408837+08 |          737 | streaming | 0/6000148 | 0/6000148 | 0/6000148 | 0/6000148  | 00:00:00.041589 | 0
0:00:00.041805 | 00:00:00.041809 |             0 | async      | 2023-05-16 18:12:33.178872+08
(1 row)

3、主备数据同步

# 主库事务操作
prod=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 f
(1 row)

prod=# insert into t1 values(40,'d');
INSERT 0 1
prod=# select count(*) from t1;
 count
-------
     4
(1 row)

# 备库查询
prod=# select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

prod=# select count(*) from t1;
 count
-------
     4
(1 row)

四、创建物理复制槽
1、备库postgresql.auto.conf中配置复制槽信息

[postgres@node102 bin]$ cat /data/pg14/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'application_name=node102 host=192.168.1.101 port=5432 user=postgre password=123456'
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on
primary_slot_name='slot_node102'

2、主库端创建复制槽

postgres=# select * from pg_replication_slots;
 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | re
start_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
-----------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+---
----------+---------------------+------------+---------------+-----------
(0 rows)

postgres=# select pg_create_physical_replication_slot('slot_node101');
 pg_create_physical_replication_slot
-------------------------------------
 (slot_node101,)
(1 row)

postgres=# select pg_create_physical_replication_slot('slot_node102');
 pg_create_physical_replication_slot
-------------------------------------
 (slot_node102,)
(1 row)

3、备库重启后复制槽状态

postgres=# select * from pg_replication_slots;
  slot_name   | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin |
 restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase
--------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+
-------------+---------------------+------------+---------------+-----------
 slot_node101 |        | physical  |        |          | f         | f      |            |      |              |
             |                     |            |               | f
 slot_node102 |        | physical  |        |          | f         | t      |      24360 |  738 |              |
 0/90001C0   |                     | reserved   |               | f
(2 rows)

4、配置主库postgresql.auto.conf

[postgres@node101 bin]$ cat /data/pg14/data/postgresql.auto.conf
# Do not edit this file manually!
# It will be overwritten by the ALTER SYSTEM command.
primary_conninfo = 'application_name=node102 host=192.168.1.102 port=5432 user=postgre password=123456'
max_standby_streaming_delay = 30s
wal_receiver_status_interval = 10s
hot_standby_feedback = on
primary_slot_name='slot_node101'

五、配置流复制同步(sync)模式

Tips:
默认流复制同步模式为async,异步模式。

1、查看配置参数

prod=# show synchronous_commit ;
 synchronous_commit
--------------------
 on
(1 row)

prod=# show synchronous_standby_names;
 synchronous_standby_names
---------------------------

(1 row)

2、配置同步节点

prod=#  alter system set synchronous_standby_names='1(node101,node102)';
ALTER SYSTEM

prod=# select pg_reload_conf();
 pg_reload_conf
----------------
 t
(1 row)

# 查看流复制同步模式
prod=# select * from pg_stat_replication;
  pid  | usesysid | usename | application_name |  client_addr  | client_hostname | client_port |         backend
_start         | backend_xmin |   state   | sent_lsn  | write_lsn | flush_lsn | replay_lsn | write_lag | flush_l
ag | replay_lag | sync_priority | sync_state |          reply_time
-------+----------+---------+------------------+---------------+-----------------+-------------+----------------
---------------+--------------+-----------+-----------+-----------+-----------+------------+-----------+--------
---+------------+---------------+------------+-------------------------------
 24360 |       10 | postgre | node102          | 192.168.1.102 |                 |       23511 | 2023-05-17 11:4
9:19.859742+08 |              | streaming | 0/90003E8 | 0/90003E8 | 0/90003E8 | 0/90003E8  |           |
   |            |             2 | sync       | 2023-05-17 14:04:19.630882+08
(1 row)

如下图所示,流复制同步模式为sync:

posted @ 2023-05-19 16:17  天涯客1224  阅读(51)  评论(0编辑  收藏  举报