redis sentinel(哨兵)

Redis-Sentinel是Redis官方推荐的高可用性(HA)解决方案,当用Redis做Master-slave的高可用方案时,假如master宕机了,Redis本身(包括它的很多客户端)都没有实现自动进行主备切换,而Redis-sentinel本身也是一个独立运行的进程,它能监控多个master-slave集群,发现master宕机后能进行自动切换。

环境准备:

ip | hostname | server
--- | --- | --- | ---
192.168.20.3 | node2003 | redis-master,sentinel
192.168.20.4 | node2004 | redis-slave,sentinel
192.168.20.5 | node2005 | redis-slave,sentinel

这里使用yum方式安装


node2003:

redis配置文件:

~]# grep "^[^#]" /etc/redis.conf
bind 192.168.20.3
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize systemd
slave-priority 100  
//设置优先级,当master因各种原因断开了,node2004和node2005会根据优先级大小选择新的master。如果优先级相同,则会根据复制的下标来判断,哪个从master接收的复制数据多,哪个就靠前。如果复制ID也相同,则选择进程ID较小的。还有会根据slave与master断开连接的次数,断开过越多就不适合。
masterauth foo   
requirepass foo  
//当使用sentinel时,一个master可能会变成slave,一个slave也可能会变成master,所以需要同时设置`masterauth`和`requirepass`

...

其它配置使用默认值,这里只显示有关操作的切换操作相关配置

sentinel配置文件:
sentinel会自动从master那里获取其它sentinel相关信息组成集群,也会从master那里获取slave相关信息。

~]# grep "^[^#]" /etc/redis-sentinel.conf 
bind node2003
port 26379
dir /tmp
sentinel monitor R1 node2003 6379 2
//监控的master名字叫R1,地址为node2003:6379。2代表,当sentinel集群中有2(总共3个,大于半数)个认为master已经不可用了,才能真正认为该master不可用。

sentinel auth-pass R1 foo
//设置连接master和slave时的密码。sentinel不能分别为master和slave设置不同的密码,因此密码应该设置相同。

sentinel down-after-milliseconds R1 30000
//多长时间失效,一个master才会被这个sentinel SDOWN(主观地)认为不可用。单位毫秒

sentinel parallel-syncs R1 1
//发生failover主备切换时最多可以有多少个slave同时对新的master进行同步。根据实际情况,小于slave数量,数据慢慢复制。等于slave数量,复制的这段时间服务将不可用。

sentinel failover-timeout R1 25000
//failover-timeout 可以用在以下这些方面: 
1. 同一个sentinel对同一个master两次failover之间的间隔时间。
2. 当一个slave从一个错误的master那里同步数据开始计算时间。直到slave被纠正为向正确的master那里同步数据时。
3.当想要取消一个正在进行的failover所需要的时间。  
4.当进行failover时,配置所有slaves指向新的master所需的最大时间。不过,即使过了这个超时,slaves依然会被正确配置为指向master,但是就不按parallel-syncs所配置的规则来了。

logfile /var/log/redis/sentinel.log

node2004和node2005:

redis配置文件:

~]# grep "^[^#]" /etc/redis.conf
bind node2004      //node2004和node2005只有此处地址绑定不同,其它配置一样
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised systemd
slaveof 192.168.20.3 6379
masterauth foo
requirepass foo
slave-priority 100
...

sentinel配置文件:

~]# grep "^[^#]" /etc/redis-sentinel.conf 
bind node2004
port 26379
dir /tmp
sentinel monitor R1 node2003 6379 2
 sentinel auth-pass R1 foo
sentinel down-after-milliseconds R1 30000
sentinel parallel-syncs R1 1
sentinel failover-timeout R1 20000
logfile /var/log/redis/sentinel.log

测试:

启动node2003,node2004,node2005的redis和redis-sentinel

~]# systemctl start redis
~]# systemctl start redis-sentinel

查看日志: node2003: ``` ~]# tail -f /var/log/redis/sentinel.log 6719:X 28 Dec 10:45:44.973 * supervised by systemd, will signal readiness _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 3.2.12 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in sentinel mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 26379 | `-._ `._ / _.-' | PID: 6719 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-'

6719:X 28 Dec 10:45:44.975 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
6719:X 28 Dec 10:45:44.975 # Sentinel ID is e698fe03128cb9460cea882d5144387c142a45f3 //这里报了个backlog错,说系统值太小,手动调大即可
6719:X 28 Dec 10:45:44.975 # +monitor master R1 192.168.20.3 6379 quorum 2
6719:X 28 Dec 10:45:44.976 * +slave slave 192.168.20.4:6379 192.168.20.4 6379 @ R1 192.168.20.3 6379
6719:X 28 Dec 10:45:44.976 * +slave slave 192.168.20.5:6379 192.168.20.5 6379 @ R1 192.168.20.3 6379 /已经发现node2004和node2005两台slave了
6719:X 28 Dec 10:46:05.142 * +fix-slave-config slave 192.168.20.5:6379 192.168.20.5 6379 @ R1 192.168.20.3 6379

<br />

node2004:

~]# tail -f /var/log/redis/sentinel.log
...
20779:X 28 Dec 10:48:29.663 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128.
20779:X 28 Dec 10:48:29.664 # Sentinel ID is d1705f00f601d82871bde7a2719cb39e8e880984
20779:X 28 Dec 10:48:29.664 # +monitor master R1 192.168.20.3 6379 quorum 2
20779:X 28 Dec 10:48:30.501 * +sentinel sentinel e698fe03128cb9460cea882d5144387c142a45f3 192.168.20.3 26379 @ R1 192.168.20.3 6379 //已经发现node2003的sentinel了
...

<br />

node2005:

~]# tail -f /var/log/redis/sentinel.log
...
8475:X 28 Dec 10:49:44.089 # Sentinel ID is e21ab29980073fbed07e0b1719a6c6f270ebc10a
8475:X 28 Dec 10:49:44.089 # +monitor master R1 192.168.20.3 6379 quorum 2
8475:X 28 Dec 10:49:44.575 * +sentinel sentinel e698fe03128cb9460cea882d5144387c142a45f3 192.168.20.3 26379 @ R1 192.168.20.3 6379
8475:X 28 Dec 10:49:46.133 * +sentinel sentinel d1705f00f601d82871bde7a2719cb39e8e880984 192.168.20.4 26379 @ R1 192.168.20.3 6379 //已经发现node2003和node2004两台sentinel服务了
...

这时整个redis和sentinel都已经启动完成。接下来测试故障转移功能。
<br />


查看主从信息:

~]# redis-cli -h node2003 -p 6379 -a foo
node2003:6379> INFO replication

Replication

role:master
connected_slaves:2
slave0:ip=192.168.20.4,port=6379,state=online,offset=391855,lag=1
slave1:ip=192.168.20.5,port=6379,state=online,offset=391988,lag=0 //查看replication信息
master_repl_offset:392254
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:392253

<br />


关闭node2003上的上的redis,并查看状态变换

~]# systemctl stop redis

//查看node2005 sentinel的日志看下
~]# tail -f /var/log/redis/sentinel.log
20200:X 28 Dec 14:28:01.640 # Sentinel ID is e21ab29980073fbed07e0b1719a6c6f270ebc10a
20200:X 28 Dec 14:28:01.640 # +monitor master R1 192.168.20.3 6379 quorum 2
20200:X 28 Dec 14:28:01.641 * +slave slave 192.168.20.4:6379 192.168.20.4 6379 @ R1 192.168.20.3 6379
20200:X 28 Dec 14:28:01.641 * +slave slave 192.168.20.5:6379 192.168.20.5 6379 @ R1 192.168.20.3 6379
20200:X 28 Dec 14:28:09.782 * +sentinel sentinel d1705f00f601d82871bde7a2719cb39e8e880984 192.168.20.4 26379 @ R1 192.168.20.3 6379
20200:X 28 Dec 14:28:11.778 * +sentinel sentinel e698fe03128cb9460cea882d5144387c142a45f3 192.168.20.3 26379 @ R1 192.168.20.3 6379 //刚启动时node2003,node2004,node2005都在线的时候正常日志
20200:X 28 Dec 14:28:11.779 # +new-epoch 8
20200:X 28 Dec 14:29:43.387 # +sdown master R1 192.168.20.3 6379 //发现node2003出现问题,主观不可用
20200:X 28 Dec 14:29:43.399 # +new-epoch 9
20200:X 28 Dec 14:29:43.400 # +vote-for-leader d1705f00f601d82871bde7a2719cb39e8e880984 9
20200:X 28 Dec 14:29:43.470 # +odown master R1 192.168.20.3 6379 #quorum 3/2 //达到法定票数,客观不可用
20200:X 28 Dec 14:29:43.470 # Next failover delay: I will not start a failover before Fri Dec 28 14:30:24 2018 //failover期间node2003未恢复
20200:X 28 Dec 14:29:44.140 # +config-update-from sentinel d1705f00f601d82871bde7a2719cb39e8e880984 192.168.20.4 26379 @ R1 192.168.20.3 6379
20200:X 28 Dec 14:29:44.140 # +switch-master R1 192.168.20.3 6379 192.168.20.5 6379 //切换master至node2005
20200:X 28 Dec 14:29:44.141 * +slave slave 192.168.20.4:6379 192.168.20.4 6379 @ R1 192.168.20.5 6379
20200:X 28 Dec 14:29:44.141 * +slave slave 192.168.20.3:6379 192.168.20.3 6379 @ R1 192.168.20.5 6379
20200:X 28 Dec 14:30:14.146 # +sdown slave 192.168.20.3:6379 192.168.20.3 6379 @ R1 192.168.20.5 6379

<br />

查看replication信息:

node2004:
node2004:6379> info replication

Replication

role:slave
master_host:192.168.20.5 //master已经切换成node2005
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:128128
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
node2004:6379>

node2005:
node2005:6379> INFO replication

Replication

role:master
connected_slaves:1
slave0:ip=192.168.20.4,port=6379,state=online,offset=19183,lag=1
master_repl_offset:19183
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:19182
node2005:6379>

<br />

**恢复node2003上的redis:**

node2005:6379> INFO replication

Replication

role:master
connected_slaves:2
slave0:ip=192.168.20.4,port=6379,state=online,offset=146099,lag=1
slave1:ip=192.168.20.3,port=6379,state=online,offset=146232,lag=1
master_repl_offset:146232
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:146231
node2005:6379>

可以看到自动将node2003加入到node2005下。


最后查看一下node2003的sentinel的配置文件:

~]# vim /etc/redis-sentinel.conf
...
sentinel monitor R1 192.168.20.5 6379 2

~]# vim /etc/redis.conf
...
slaveof 192.168.20.5 6379

可以看到node2003配置文件中原先是没这些配置的。sentinel会自己修改其中配置,这样重启sentinel也不会担心相关信息丢失了。



sentinel原理可参考如下文章:https://segmentfault.com/a/1190000002680804
posted @ 2018-12-28 15:01  dance_man  阅读(377)  评论(0编辑  收藏  举报