redis哨兵高可用
sentinel的工作过程:
sentinel安装在另外的主机上,sentinel主机既能监控又能提供配置功能,向sentinel指明主redis服务器即可(仅监控主服务器),sentinel可以从主服务中获取主从架信息,并分辨从节点,sentinel可以监控当前整个主从服务器架构的工作状态,一旦发现master离线的情况,sentinel会从多个从服务器中选择并提升一个从节点成为主节点,当主节点被从节点取代以后,那么IP地址则发生了,客户所连接之前的主节点IP则不无法连接,此时可以向sentinel发起查询请求,sentinel会告知客户端新的主节点的IP,所以sentinel是redis在主从架构中实现高可用的解决方,sentinel为了误判和单点故障,sentinel也应该组织为集群,sentinel多个节点同时监控redis主从架构,一旦有一个sentinel节点发现redis的主节点不在线时,sentinel会与其他的sentinel节点协商其他的sentinel节点是否也为同样发现redis的主节点不在线的情况,如果sentinel的多个点节点都发现redis的主节点都为离线的情况,那么则判定redis主节点为离线状态,以此方式避免误判,同样也避免了单点故障
总结:1、用于管理多个redis服务实现HA 2、监控多个redis服务节点 3、自动故障转移
redis-sentinel.conf文件常用配置参数:
(1) sentinel monitor <master-name> <ip> <redis-port> <quorum> //此项可以出现多次,可以监控多组redis主从架构,此项用于监控主节点 <master-name> 自定义的主节点名称,<ip> 主节点的IP地址,<redis-port>主节点的端口号,<quorum>主节点对应的quorum法定数量,用于定义sentinel的数量,是一个大于值尽量使用奇数,一般建议将其设置为Sentinel节点的一半加1,如果sentinel有3个,则指定为2即可,如果有4个,不能够指定为2,避免导致集群分裂,注意,<master-name>为集群名称,可以自定义,如果同时监控有多组redis集群时,<master-name>不能同样
(2) sentinel down-after-milliseconds <master-name> <milliseconds> //sentinel连接其他节点超时时间,单位为毫秒(默认为30秒)
(3) sentinel parallel-syncs <master-name> <numslaves> //提升主服务器时,允许多少个从服务向新的主服务器发起同步请求
(4) sentinel failover-timeout <master-name> <milliseconds> //故障转移超时时间,在指定时间没能完成则判定为失败,单位为毫秒(默认为180秒)
实验说明:
准备三台机子。
192.168.1.5 为sentinel
192.168.1.6 为master
192.168.1.7 为slave
三台主机都已安装好redis(配置epel源就可以安装了)
1.6配置:
[root@6 ~]# vim /etc/redis.conf
bind 192.168.1.6
1.7配置:
[root@7 ~]# vim /etc/redis.conf
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~ bind 192.168.1.7 ...... slaveof 192.168.1.6 6379
1.5配置:
[root@ml ~]# vim /etc/redis-sentinel.conf
bind 192.168.1.5 sentinel monitor qunzu 192.168.1.6 6379 1
[root@ml ~]# systemctl restart redis-sentinel
[root@ml ~]# redis-cli -h 192.168.1.5 -p 26379
192.168.1.5:26379> info sentinel
# Sentinel sentinel_masters:2 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=sdown,address=127.0.0.1:6379,slaves=0,sentinels=1 master1:name=qunzu,status=ok,address=192.168.1.6:6379,slaves=1,sentinels=1
192.168.1.63:26379> sentinel masters //获取主节点及从节点的信息
1) 1) "name" #第一个节点 2) "mymaster" 3) "ip" 4) "127.0.0.1" 5) "port" 6) "6379" 7) "runid" 8) "" 9) "flags" 10) "s_down,master,disconnected" 11) "link-pending-commands" 12) "2" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "636869" 17) "last-ok-ping-reply" 18) "636869" 19) "last-ping-reply" 20) "636869" 21) "s-down-time" 22) "606856" 23) "down-after-milliseconds" 24) "30000" 25) "info-refresh" 26) "1586413602435" 27) "role-reported" 28) "master" 29) "role-reported-time" 30) "636869" 31) "config-epoch" 32) "0" 33) "num-slaves" 34) "0" 35) "num-other-sentinels" 36) "0" 37) "quorum" 38) "2" 39) "failover-timeout" 40) "180000" 41) "parallel-syncs" 42) "1" 2) 1) "name" #第二个节点 2) "qunzu" 3) "ip" 4) "192.168.1.6" 5) "port" 6) "6379" 7) "runid" 8) "c8a9345764e1a44e56b62547efb3107b7b24fcdf" 9) "flags" 10) "master" 11) "link-pending-commands" 12) "0" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "0" 17) "last-ok-ping-reply" 18) "312" 19) "last-ping-reply" 20) "312" 21) "down-after-milliseconds" 22) "30000" 23) "info-refresh" 24) "4274" 25) "role-reported" 26) "master" 27) "role-reported-time" 28) "636869" 29) "config-epoch" 30) "0" 31) "num-slaves" 32) "1" 33) "num-other-sentinels" 34) "0" 35) "quorum" 36) "1" 37) "failover-timeout" 38) "180000" 39) "parallel-syncs" 40) "1"
192.168.1.5:26379> sentinel slaves quzhu(这个是组名) //获取mymaster集群的从节点信息
1) 1) "name" 2) "192.168.1.7:6379" 3) "ip" 4) "192.168.1.7" 5) "port" 6) "6379" 7) "runid" 8) "9df62fb2d5252c62206dafc8554e630e87726cc2" 9) "flags" 10) "slave" 11) "link-pending-commands" 12) "0" 13) "link-refcount" 14) "1" 15) "last-ping-sent" 16) "0" 17) "last-ok-ping-reply" 18) "192" 19) "last-ping-reply" 20) "192" 21) "down-after-milliseconds" 22) "30000" 23) "info-refresh" 24) "9991" 25) "role-reported" 26) "slave" 27) "role-reported-time" 28) "893674" 29) "master-link-down-time" 30) "0" 31) "master-link-status" 32) "ok" 33) "master-host" 34) "192.168.1.6" 35) "master-port" 36) "6379" 37) "slave-priority" 38) "100" 39) "slave-repl-offset" 40) "58854"
尝试关闭master
目前master:
192.168.1.5:26379> info sentinel
# Sentinel sentinel_masters:2 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=sdown,address=127.0.0.1:6379,slaves=0,sentinels=1 master1:name=qunzu,status=ok,address=192.168.1.6:6379,slaves=1,sentinels=1
关闭master
[root@6 ~]# systemctl stop redis
在查询一下,发现master已经变成了192.168.1.7了
192.168.1.5:26379> info sentinel
# Sentinel sentinel_masters:2 sentinel_tilt:0 sentinel_running_scripts:0 sentinel_scripts_queue_length:0 sentinel_simulate_failure_flags:0 master0:name=mymaster,status=sdown,address=127.0.0.1:6379,slaves=0,sentinels=1 master1:name=qunzu,status=ok,address=192.168.1.7:6379,slaves=1,sentinels=1
再次启动master
[root@6 ~]# systemctl restart redis
192.168.1.6:6379> info replication #发现以前的master已经变成了slave了
# Replication role:slave master_host:192.168.1.7 master_port:6379 master_link_status:up master_last_io_seconds_ago:2 master_sync_in_progress:0 slave_repl_offset:2626 slave_priority:100 slave_read_only:1 connected_slaves:0 master_repl_offset:0 repl_backlog_active:0 repl_backlog_size:1048576 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0
实验总结:
当部署sentinel后,master宕机后,sentinel端会在slave中选择出一个作为master,当旧master恢复后,它将会成为了slave,不在成为master了。