pg流复制_同步

同步流复制和异步流复制的搭建方法基本相同,但需要配置一些其它的参数来控制流复制。

  • 参数:

  • synchronous_commit
    当synchronous_standby_names为空时,它为单实时状态
    off:表示提交时不需要等待wal Buffer写入到WAL日志磁盘后才向客户端返回成功
    off外的其它值:等于on,均需要wal Buffer写入到磁盘后才返回成功
     
    当synchronous_standby_names不为空时,为流复制状态
    on:主库事务提交时,需要等待备库接收到wal日志并将内容写到磁盘后,主库才返回成功,
    remote_write:主库事务提交时,需要等待备库接收到wal日志并将内容写到操作系统缓存后,主库才返回成功,与on的区别是不要求落盘
    remote_apply:表示主库事务提交时,需要等待备库接收到wal日志并写到磁盘,并且已经完成了apply重做

  • 配置同步复制:

相比异步流复制,同步流复制新增以下点:

--备库postgresql.conf(12之前为recovery.conf)中primary_conninfo参数配置中添加application_name
primary_conninfo = 'host=192.168.150.130 port=6000 user=repuser application_name=pg93std'  
重启生效后在主库即可查询到该application_name
postgres@postgres:select pid,usename,application_name,client_addr from pg_stat_replication;
 pid  | usename | application_name |   client_addr   
------+---------+------------------+-----------------
 4506 | repuser | pg93std          | 192.168.150.132
(1 row)
--主库配置参数
synchronous_commit = on
synchronous_standby_names = 'pg93std'
--reload生效
[postgres@pg93 pgdata]$ pg_ctl reload -D /pgdata
server signaled
postgres@postgres:select name,setting from pg_settings where name like 'synchronous%';
           name            | setting 
---------------------------+---------
 synchronous_commit        | on
 synchronous_standby_names | pg93std
(2 rows)
--查询同步状态
postgres@postgres:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
 pid  | usename | application_name |   client_addr   | sync_state |   state   
------+---------+------------------+-----------------+------------+-----------
 4506 | repuser | pg93std          | 192.168.150.132 | sync       | streaming
(1 row)
--可见已经是同步状态了

 

  • 同步流复制测试

--同步测试
--主
postgres@findb:select count(*) from tbl_json;
 count 
-------
     5
(1 row)

postgres@findb:truncate table tbl_json;
TRUNCATE TABLE
--备
[postgres@pg93s pgdata]$ psql findb
psql (12.3)
Type "help" for help.
findb=# select count(*) from tbl_json;
 count 
-------
     0
(1 row)
--备库中断
--停备
[postgres@pg93s pgdata]$ pg_ctl stop -D /pgdata/
waiting for server to shut down.... done
server stopped
--主库
postgres@postgres:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
 pid | usename | application_name | client_addr | sync_state | state 
-----+---------+------------------+-------------+------------+-------
(0 rows)
postgres@findb:drop table t1;
--卡住了......
--启动备库
[postgres@pg93s pgdata]$ pg_ctl start -D /pgdata
waiting for server to start....2020-10-06 20:26:04.072 CST [6944] LOG:  starting PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
2020-10-06 20:26:04.073 CST [6944] LOG:  listening on IPv4 address "192.168.150.132", port 6000
2020-10-06 20:26:04.088 CST [6944] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.6000"
2020-10-06 20:26:04.158 CST [6944] LOG:  redirecting log output to logging collector process
2020-10-06 20:26:04.158 CST [6944] HINT:  Future log output will appear in directory "log".
 done
--源库的操作恢复正常完成,日志也提示恢复同步
postgres@findb:drop table t1;
DROP TABLE

2020-12-20 12:16:55.401 CST [4588] LOG:  standby "pg93std" is now a synchronous standby with priority 1

--修改主库参数 synchronous_commit为local
postgres@postgres:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
 pid  | usename | application_name |   client_addr   | sync_state |   state   
------+---------+------------------+-----------------+------------+-----------
 4588 | repuser | pg93std          | 192.168.150.132 | sync       | streaming
(1 row)

postgres@postgres:select name,setting from pg_settings where name like 'synchronous%';
           name            | setting 
---------------------------+---------
 synchronous_commit        | local
 synchronous_standby_names | pg93std
--再停止备库,主库是可以正常执行操作的
[postgres@pg93s pgdata]$ pg_ctl stop -D /pgdata
waiting for server to shut down.... done
--主
postgres@findb:drop table t2 ;
DROP TABLE
(2 rows)
--此时主查询不到同步
postgres@findb:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
 pid | usename | application_name | client_addr | sync_state | state 
-----+---------+------------------+-------------+------------+-------
(0 rows)
--再启动备
[postgres@pg93s pgdata]$ pg_ctl start -D /pgdata
waiting for server to start....2020-10-06 20:32:39.214 CST [6966] LOG:  starting PostgreSQL 12.3 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 64-bit
2020-10-06 20:32:39.214 CST [6966] LOG:  listening on IPv4 address "192.168.150.132", port 6000
2020-10-06 20:32:39.226 CST [6966] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.6000"
2020-10-06 20:32:39.295 CST [6966] LOG:  redirecting log output to logging collector process
2020-10-06 20:32:39.295 CST [6966] HINT:  Future log output will appear in directory "log".
 done
server started
--再次查询主的同步状态
postgres@findb:select pid,usename,application_name,client_addr,sync_state,state from pg_stat_replication;
 pid  | usename | application_name |   client_addr   | sync_state |   state   
------+---------+------------------+-----------------+------------+-----------
 4621 | repuser | pg93std          | 192.168.150.132 | sync       | streaming
(1 row)
--此时可见又恢复到了sync状态
--这种配置和所谓“可退化的同步复制”吗?待研究
--生产上通常需要有两个或以上的备库时,才会选择使用同步复制方式,这样当有备库故障时,至少有一个备库可以继续接收主发送过来的日志

 

  • 流复制监控

--监控主库pg_stat_replication上的状态和延迟(10或以后版本&主挂了就查不了了)
postgres@postgres:select pid,usename,client_addr,application_name,state,sync_state,write_lag,flush_lag,replay_lag,reply_time from pg_stat_replication;
 pid  | usename |   client_addr   | application_name |   state   | sync_state |    write_lag    |    flush_lag    |   replay_lag    |          reply_time           
------+---------+-----------------+------------------+-----------+------------+-----------------+-----------------+-----------------+-------------------------------
 4621 | repuser | 192.168.150.132 | pg93std          | streaming | sync       | 00:00:00.017919 | 00:00:00.019003 | 00:00:00.021647 | 2020-10-06 21:24:43.725341+08
(1 row)

--在备上对比当前时间和最后应用时间(10以前版本可用)
postgres=# select now(),pg_last_xact_replay_timestamp();
              now              | pg_last_xact_replay_timestamp 
-------------------------------+-------------------------------
 2020-10-06 21:30:12.983656+08 | 2020-12-20 13:20:24.564295+08
(1 row)

postgres=# select now()-pg_last_xact_replay_timestamp();
         ?column?          
---------------------------
 -74 days -15:49:18.659168
(1 row)
--可见由于两边时间不同步导致此方式无效,所以此方法需要两边的时间是同步的(使用ntp),另外一个问题是如果主库没有事务,则备端最近应用的时间就会隔得比较长,导致显示的差异时间比较大而误报

--对比日志延迟量(主挂则不可用)
postgres@postgres:select pid,usename,client_addr,state,application_name,
pg_wal_lsn_diff(pg_current_wal_lsn(),write_lsn) write_delay,
pg_wal_lsn_diff(pg_current_wal_lsn(),flush_lsn) flush_delay,
pg_wal_lsn_diff(pg_current_wal_lsn(),replay_lsn) replay_delay from pg_stat_replication;
 pid  | usename |   client_addr   |   state   | application_name | write_delay | flush_delay | replay_delay 
------+---------+-----------------+-----------+------------------+-------------+-------------+--------------
 4621 | repuser | 192.168.150.132 | streaming | pg93std          |     6684672 |     6684672 |      6684720
(1 row)
--
--还有一个方法是创建一张,每隔一段时间(如1秒)往里面插入一条数据,包含当前时间戳,lsn位置等,通过此方法并结合备的时间,也能大概知道主备相差多少日志
--扩展一下还可以将主的当前lsn和时间戳同步到第三方如redis里面(如0.1秒每次),备通过查询redis也能在主宕机的情况下大概知道和主的lsn差距

postgres=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 6972
status                | streaming                                --和pg_stat_replication的state一样
receive_start_lsn     | 1/9A000000                               --启动时的lsn
receive_start_tli     | 1
received_lsn          | 1/FF2CB870
received_tli          | 1
last_msg_send_time    | 2020-12-20 16:09:16.100045+08            --收到的主wal sender最后一次发送消息的时间(主时间)
last_msg_receipt_time | 2020-10-07 00:18:13.558129+08            --收到主wal sender最后一次发送的消息的时间(备时间)
latest_end_lsn        | 1/FF2CB870                                          --主发送的消息中lsn的最后位置
latest_end_time       | 2020-12-20 13:35:58.721361+08            --主发送的消息中的lsn最后位置的时间
slot_name             | 
sender_host           | 192.168.150.130
sender_port           | 6000
conninfo              | user=repuser passfile=/home/postgres/.pgpass dbname=replication host=192.168.150.130 port=6000 application_name=pg93std fallback_application_name=walreceiver sslmode=prefer sslcompression=0 gssencmode=prefer krbsrvname=postgres target_session_attrs=any

 

  • 相关函数

1、判断数据库是否处于恢复模式(常用于判断主库还是备库)

--主
postgres@postgres:select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 f
(1 row)
--备
postgres=# select pg_is_in_recovery();
 pg_is_in_recovery 
-------------------
 t
(1 row)

2、查看当前在写的WAL lsn

--主
postgres@postgres:select pg_current_wal_lsn();
 pg_current_wal_lsn 
--------------------
 2/10000A0
(1 row)
--备
postgres=# select pg_current_wal_lsn();
ERROR:  recovery is in progress
HINT:  WAL control functions cannot be executed during recovery.

3、查看备最近接收到的WAL日志位置

--主
postgres@postgres:select pg_last_wal_receive_lsn();
 pg_last_wal_receive_lsn 
-------------------------
 
(1 row)

--备
postgres=# select pg_last_wal_receive_lsn();
 pg_last_wal_receive_lsn 
-------------------------
 2/10000D8
(1 row)

4、查看备最近应用的WAL日志位置

--主
postgres@postgres:select pg_last_wal_replay_lsn();
 pg_last_wal_replay_lsn 
------------------------
 
(1 row)
--备
postgres=# select pg_last_wal_replay_lsn();
 pg_last_wal_replay_lsn 
------------------------
 2/10001C0
(1 row)

5、其它函数

查看备最近事务应用时间:
postgres=# select pg_last_xact_replay_timestamp();
 pg_last_xact_replay_timestamp 
-------------------------------
 2020-12-20 13:34:46.657621+08
(1 row)

查看LSN之间的偏移量差异
postgres=# select pg_wal_lsn_diff('2/10001C0','2/10000D8');
 pg_wal_lsn_diff 
-----------------
             232
(1 row)
posted @ 2021-02-05 23:51  Leadx·Wu  阅读(316)  评论(0编辑  收藏  举报