redis sentinel auto-failover 一对主从实例测试全过程
环境:Red Hat Enterprise Linux Server release 6.5 (Santiago)
主实例端口:38001
从实例端口:38002、38003
sentinel实例端口:39001、39002、39003
一、启动实例
一开始sentinel的配置文件:
### 10.10.100.76 38001 sentinel monitor master1 10.10.100.76 38001 2 sentinel down-after-milliseconds master1 30000 sentinel failover-timeout master1 60000 sentinel parallel-syncs master1 1
启动redis 主从实例
[boss@localhost src]$ ./redis-server ../conf/redis38001.conf >> ../conf/logs/redis38001.log & [1] 8820 [boss@localhost src]$ ./redis-server ../conf/redis38002.conf >> ../conf/logs/redis38002.log & [2] 8825 [boss@localhost src]$ ./redis-server ../conf/redis38003.conf >> ../conf/logs/redis38003.log & [3] 8829 [boss@localhost src]$
启动sentinel 实例
[boss@localhost src]$ ./redis-sentinel ../conf/sentinel39001.conf >> ../conf/logs/sentinel39001.log & [1] 8869 [boss@localhost src]$ ./redis-sentinel ../conf/sentinel39002.conf >> ../conf/logs/sentinel39002.log & [2] 8872 [boss@localhost src]$ ./redis-sentinel ../conf/sentinel39003.conf >> ../conf/logs/sentinel39003.log & [3] 8875 [boss@localhost src]$
查看启动的redis实例
[boss@localhost src]$ ps -ef | grep redis boss 8820 8795 0 17:42 pts/2 00:00:00 ./redis-server *:38001 boss 8825 8795 0 17:43 pts/2 00:00:00 ./redis-server *:38002 boss 8829 8795 0 17:43 pts/2 00:00:00 ./redis-server *:38003 boss 8869 8842 0 17:46 pts/3 00:00:00 ./redis-sentinel *:39001 [sentinel] boss 8872 8842 0 17:46 pts/3 00:00:00 ./redis-sentinel *:39002 [sentinel] boss 8875 8842 0 17:46 pts/3 00:00:00 ./redis-sentinel *:39003 [sentinel] boss 8879 8683 0 17:46 pts/1 00:00:00 grep redis [boss@localhost src]$
sentinel启动后的配置文件:
### 10.10.100.76 38001 sentinel monitor master1 10.10.100.76 38001 2 sentinel failover-timeout master1 60000 sentinel config-epoch master1 0 sentinel leader-epoch master1 0 # Generated by CONFIG REWRITE maxclients 4064 sentinel known-slave master1 10.10.100.76 38003 sentinel known-slave master1 10.10.100.76 38002 sentinel known-sentinel master1 10.10.100.76 39002 78596f9d15311475e841904788784851c961e145 sentinel known-sentinel master1 10.10.100.76 39001 20363efd6f67c1e51364205884e8d6fcdb1cc96d sentinel current-epoch 0
sentinel启动日志(三个都差不多相同):
启动,生成sentinel的runid,并监视master,并得到master的slaves 和 一起监视这个master的sentinels
8869:X 26 Jul 17:46:10.854 # You requested maxclients of 10000 requiring at least 10032 max file descriptors. 8869:X 26 Jul 17:46:10.854 # Redis can't set maximum open files to 10032 because of OS error: Operation not permitted. 8869:X 26 Jul 17:46:10.854 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'. _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 3.0.7 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in sentinel mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 39001 | `-._ `._ / _.-' | PID: 8869 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 8869:X 26 Jul 17:46:10.857 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 8869:X 26 Jul 17:46:10.857 # Sentinel runid is 20363efd6f67c1e51364205884e8d6fcdb1cc96d 8869:X 26 Jul 17:46:10.857 # +monitor master master1 10.10.100.76 38001 quorum 2 8869:X 26 Jul 17:46:11.858 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8869:X 26 Jul 17:46:11.868 * +slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8869:X 26 Jul 17:46:19.608 * +sentinel sentinel 10.10.100.76:39002 10.10.100.76 39002 @ master1 10.10.100.76 38001 8869:X 26 Jul 17:46:24.791 * +sentinel sentinel 10.10.100.76:39003 10.10.100.76 39003 @ master1 10.10.100.76 38001
redis master实例日志(port:38001)
启动,并和slave全同步
......(前面的就不复制了) 8820:M 26 Jul 17:42:55.889 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 8820:M 26 Jul 17:42:55.889 # Server started, Redis version 3.0.7 8820:M 26 Jul 17:42:55.890 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 8820:M 26 Jul 17:42:55.890 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 8820:M 26 Jul 17:42:55.890 * DB loaded from disk: 0.000 seconds 8820:M 26 Jul 17:42:55.890 * The server is now ready to accept connections on port 38001 8820:M 26 Jul 17:43:08.360 * Slave 10.10.100.76:38002 asks for synchronization 8820:M 26 Jul 17:43:08.360 * Full resync requested by slave 10.10.100.76:38002 8820:M 26 Jul 17:43:08.360 * Starting BGSAVE for SYNC with target: disk 8820:M 26 Jul 17:43:08.361 * Background saving started by pid 8828 8828:C 26 Jul 17:43:08.371 * DB saved on disk 8828:C 26 Jul 17:43:08.372 * RDB: 4 MB of memory used by copy-on-write 8820:M 26 Jul 17:43:08.418 * Background saving terminated with success 8820:M 26 Jul 17:43:08.418 * Synchronization with slave 10.10.100.76:38002 succeeded 8820:M 26 Jul 17:43:13.867 * Slave 10.10.100.76:38003 asks for synchronization 8820:M 26 Jul 17:43:13.867 * Full resync requested by slave 10.10.100.76:38003 8820:M 26 Jul 17:43:13.867 * Starting BGSAVE for SYNC with target: disk 8820:M 26 Jul 17:43:13.868 * Background saving started by pid 8832 8832:C 26 Jul 17:43:13.878 * DB saved on disk 8832:C 26 Jul 17:43:13.878 * RDB: 4 MB of memory used by copy-on-write 8820:M 26 Jul 17:43:13.930 * Background saving terminated with success 8820:M 26 Jul 17:43:13.930 * Synchronization with slave 10.10.100.76:38003 succeeded
redis slave实例(port:38002,38003和38002相同,就不贴了):
启动,并和和master一次全同步
......(前面的就不复制了) 8825:S 26 Jul 17:43:08.359 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 8825:S 26 Jul 17:43:08.359 # Server started, Redis version 3.0.7 8825:S 26 Jul 17:43:08.359 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 8825:S 26 Jul 17:43:08.359 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 8825:S 26 Jul 17:43:08.359 * DB loaded from disk: 0.000 seconds 8825:S 26 Jul 17:43:08.359 * The server is now ready to accept connections on port 38002 8825:S 26 Jul 17:43:08.359 * Connecting to MASTER 10.10.100.76:38001 8825:S 26 Jul 17:43:08.360 * MASTER <-> SLAVE sync started 8825:S 26 Jul 17:43:08.360 * Non blocking connect for SYNC fired the event. 8825:S 26 Jul 17:43:08.360 * Master replied to PING, replication can continue... 8825:S 26 Jul 17:43:08.360 * Partial resynchronization not possible (no cached master) 8825:S 26 Jul 17:43:08.361 * Full resync from master: 49d5d828d5c8f87a3d5ee910e6b92a271398f368:1 8825:S 26 Jul 17:43:08.418 * MASTER <-> SLAVE sync: receiving 40 bytes from master 8825:S 26 Jul 17:43:08.418 * MASTER <-> SLAVE sync: Flushing old data 8825:S 26 Jul 17:43:08.418 * MASTER <-> SLAVE sync: Loading DB in memory 8825:S 26 Jul 17:43:08.419 * MASTER <-> SLAVE sync: Finished with success
二、模拟master故障、自动故障转移
down掉master(port:38001):(也可以使用kill)
[boss@localhost src]$ ./redis-cli -p 38001 -a ai2016boss shutdown
[boss@localhost src]$ 
查看sentinel的日志(port:39001)
......(前面的就不复制了) 8869:X 26 Jul 17:46:10.857 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 8869:X 26 Jul 17:46:10.857 # Sentinel runid is 20363efd6f67c1e51364205884e8d6fcdb1cc96d 8869:X 26 Jul 17:46:10.857 # +monitor master master1 10.10.100.76 38001 quorum 2 8869:X 26 Jul 17:46:11.858 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8869:X 26 Jul 17:46:11.868 * +slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8869:X 26 Jul 17:46:19.608 * +sentinel sentinel 10.10.100.76:39002 10.10.100.76 39002 @ master1 10.10.100.76 38001 8869:X 26 Jul 17:46:24.791 * +sentinel sentinel 10.10.100.76:39003 10.10.100.76 39003 @ master1 10.10.100.76 38001 8869:X 26 Jul 18:00:47.667 # +sdown master master1 10.10.100.76 38001 8869:X 26 Jul 18:00:47.759 # +new-epoch 1 8869:X 26 Jul 18:00:47.761 # +vote-for-leader 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 8869:X 26 Jul 18:00:48.110 # +config-update-from sentinel 10.10.100.76:39003 10.10.100.76 39003 @ master1 10.10.100.76 38001 8869:X 26 Jul 18:00:48.110 # +switch-master master1 10.10.100.76 38001 10.10.100.76 38003 8869:X 26 Jul 18:00:48.110 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38003 8869:X 26 Jul 18:00:48.110 * +slave slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003 8869:X 26 Jul 18:01:18.145 # +sdown slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003
查看sentinel的日志(port:39002)
......(前面的就不复制了) 8872:X 26 Jul 17:46:17.568 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 8872:X 26 Jul 17:46:17.568 # Sentinel runid is 78596f9d15311475e841904788784851c961e145 8872:X 26 Jul 17:46:17.568 # +monitor master master1 10.10.100.76 38001 quorum 2 8872:X 26 Jul 17:46:17.568 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8872:X 26 Jul 17:46:17.570 * +slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8872:X 26 Jul 17:46:17.957 * +sentinel sentinel 10.10.100.76:39001 10.10.100.76 39001 @ master1 10.10.100.76 38001 8872:X 26 Jul 17:46:24.791 * +sentinel sentinel 10.10.100.76:39003 10.10.100.76 39003 @ master1 10.10.100.76 38001 8872:X 26 Jul 18:00:47.715 # +sdown master master1 10.10.100.76 38001 8872:X 26 Jul 18:00:47.760 # +new-epoch 1 8872:X 26 Jul 18:00:47.761 # +vote-for-leader 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 8872:X 26 Jul 18:00:47.773 # +odown master master1 10.10.100.76 38001 #quorum 3/2 8872:X 26 Jul 18:00:47.774 # Next failover delay: I will not start a failover before Tue Jul 26 18:02:47 2016 8872:X 26 Jul 18:00:48.111 # +config-update-from sentinel 10.10.100.76:39003 10.10.100.76 39003 @ master1 10.10.100.76 38001 8872:X 26 Jul 18:00:48.111 # +switch-master master1 10.10.100.76 38001 10.10.100.76 38003 8872:X 26 Jul 18:00:48.111 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38003 8872:X 26 Jul 18:00:48.111 * +slave slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003 8872:X 26 Jul 18:01:18.160 # +sdown slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003
查看sentinel的日志(port:39003)
......(前面的就不复制了) 8875:X 26 Jul 17:46:22.721 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 8875:X 26 Jul 17:46:22.721 # Sentinel runid is 5c24343d83dd1e0da6e1e511dc5dd690ee804065 8875:X 26 Jul 17:46:22.721 # +monitor master master1 10.10.100.76 38001 quorum 2 8875:X 26 Jul 17:46:23.722 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8875:X 26 Jul 17:46:23.731 * +slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8875:X 26 Jul 17:46:24.110 * +sentinel sentinel 10.10.100.76:39001 10.10.100.76 39001 @ master1 10.10.100.76 38001 8875:X 26 Jul 17:46:25.754 * +sentinel sentinel 10.10.100.76:39002 10.10.100.76 39002 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:47.684 # +sdown master master1 10.10.100.76 38001 8875:X 26 Jul 18:00:47.755 # +odown master master1 10.10.100.76 38001 #quorum 2/2 8875:X 26 Jul 18:00:47.755 # +new-epoch 1 8875:X 26 Jul 18:00:47.755 # +try-failover master master1 10.10.100.76 38001 8875:X 26 Jul 18:00:47.758 # +vote-for-leader 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 8875:X 26 Jul 18:00:47.761 # 10.10.100.76:39001 voted for 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 8875:X 26 Jul 18:00:47.761 # 10.10.100.76:39002 voted for 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 8875:X 26 Jul 18:00:47.859 # +elected-leader master master1 10.10.100.76 38001 8875:X 26 Jul 18:00:47.859 # +failover-state-select-slave master master1 10.10.100.76 38001 8875:X 26 Jul 18:00:47.911 # +selected-slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:47.911 * +failover-state-send-slaveof-noone slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:47.995 * +failover-state-wait-promotion slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:48.053 # +promoted-slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:48.053 # +failover-state-reconf-slaves master master1 10.10.100.76 38001 8875:X 26 Jul 18:00:48.108 * +slave-reconf-sent slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:48.881 # -odown master master1 10.10.100.76 38001 8875:X 26 Jul 18:00:49.143 * +slave-reconf-inprog slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:49.143 * +slave-reconf-done slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8875:X 26 Jul 18:00:49.208 # +failover-end master master1 10.10.100.76 38001 8875:X 26 Jul 18:00:49.208 # +switch-master master1 10.10.100.76 38001 10.10.100.76 38003 8875:X 26 Jul 18:00:49.209 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38003 8875:X 26 Jul 18:00:49.209 * +slave slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003 8875:X 26 Jul 18:01:19.294 # +sdown slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003
查看sentinel配置文件:
### 10.10.100.76 38001 sentinel monitor master1 10.10.100.76 38003 2 sentinel failover-timeout master1 60000 sentinel config-epoch master1 1 sentinel leader-epoch master1 1 # Generated by CONFIG REWRITE maxclients 4064 sentinel known-slave master1 10.10.100.76 38002 sentinel known-slave master1 10.10.100.76 38001 sentinel known-sentinel master1 10.10.100.76 39003 5c24343d83dd1e0da6e1e511dc5dd690ee804065 sentinel known-sentinel master1 10.10.100.76 39002 78596f9d15311475e841904788784851c961e145 sentinel current-epoch 1
redis  原master实例日志(port:38001)
......(前面的就不复制了) 8820:M 26 Jul 17:58:14.080 * 1 changes in 900 seconds. Saving... 8820:M 26 Jul 17:58:14.081 * Background saving started by pid 8971 8971:C 26 Jul 17:58:14.083 * DB saved on disk 8971:C 26 Jul 17:58:14.084 * RDB: 4 MB of memory used by copy-on-write 8820:M 26 Jul 17:58:14.182 * Background saving terminated with success 8820:M 26 Jul 18:00:17.627 # User requested shutdown... 8820:M 26 Jul 18:00:17.627 * Saving the final RDB snapshot before exiting. 8820:M 26 Jul 18:00:17.629 * DB saved on disk 8820:M 26 Jul 18:00:17.629 # Redis is now ready to exit, bye bye...
redis 原slave 实例日志(port:38002)
8825:S 26 Jul 17:58:09.087 * 1 changes in 900 seconds. Saving... 8825:S 26 Jul 17:58:09.088 * Background saving started by pid 8968 8968:C 26 Jul 17:58:09.093 * DB saved on disk 8968:C 26 Jul 17:58:09.093 * RDB: 4 MB of memory used by copy-on-write 8825:S 26 Jul 17:58:09.188 * Background saving terminated with success 8825:S 26 Jul 18:00:17.629 # Connection with master lost. 8825:S 26 Jul 18:00:17.629 * Caching the disconnected master state. 8825:S 26 Jul 18:00:18.331 * Connecting to MASTER 10.10.100.76:38001 8825:S 26 Jul 18:00:18.331 * MASTER <-> SLAVE sync started 8825:S 26 Jul 18:00:18.331 # Error condition on socket for SYNC: Connection refused 8825:S 26 Jul 18:00:19.332 * Connecting to MASTER 10.10.100.76:38001 8825:S 26 Jul 18:00:19.333 * MASTER <-> SLAVE sync started 8825:S 26 Jul 18:00:19.333 # Error condition on socket for SYNC: Connection refused 8825:S 26 Jul 18:00:20.334 * Connecting to MASTER 10.10.100.76:38001 8825:S 26 Jul 18:00:20.334 * MASTER <-> SLAVE sync started 8825:S 26 Jul 18:00:20.334 # Error condition on socket for SYNC: Connection refused ........(这里有很多请求连接master的日志) 8825:S 26 Jul 18:00:48.108 * Discarding previously cached master state. 8825:S 26 Jul 18:00:48.108 * SLAVE OF 10.10.100.76:38003 enabled (user request from 'id=8 addr=10.10.100.76:59327 fd=11 name=sentinel-5c24343d-cmd age=865 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=139 qbuf-free=32629 obl=36 oll=0 omem=0 events=rw cmd=exec') 8825:S 26 Jul 18:00:48.110 # CONFIG REWRITE executed with success. 8825:S 26 Jul 18:00:48.392 * Connecting to MASTER 10.10.100.76:38003 8825:S 26 Jul 18:00:48.392 * MASTER <-> SLAVE sync started 8825:S 26 Jul 18:00:48.392 * Non blocking connect for SYNC fired the event. 8825:S 26 Jul 18:00:48.392 * Master replied to PING, replication can continue... 8825:S 26 Jul 18:00:48.392 * Partial resynchronization not possible (no cached master) 8825:S 26 Jul 18:00:48.394 * Full resync from master: 0ca88bef97ff1f9dddb3985fb31db97b77f70ad0:1 8825:S 26 Jul 18:00:48.492 * MASTER <-> SLAVE sync: receiving 51 bytes from master 8825:S 26 Jul 18:00:48.492 * MASTER <-> SLAVE sync: Flushing old data 8825:S 26 Jul 18:00:48.492 * MASTER <-> SLAVE sync: Loading DB in memory 8825:S 26 Jul 18:00:48.492 * MASTER <-> SLAVE sync: Finished with success 8825:S 26 Jul 18:13:10.079 * 1 changes in 900 seconds. Saving... 8825:S 26 Jul 18:13:10.080 * Background saving started by pid 9035 9035:C 26 Jul 18:13:10.083 * DB saved on disk 9035:C 26 Jul 18:13:10.084 * RDB: 4 MB of memory used by copy-on-write 8825:S 26 Jul 18:13:10.180 * Background saving terminated with success
redis 原slave实例日志(port:38003)
8829:S 26 Jul 17:58:14.001 * 1 changes in 900 seconds. Saving... 8829:S 26 Jul 17:58:14.002 * Background saving started by pid 8970 8970:C 26 Jul 17:58:14.005 * DB saved on disk 8970:C 26 Jul 17:58:14.006 * RDB: 4 MB of memory used by copy-on-write 8829:S 26 Jul 17:58:14.103 * Background saving terminated with success 8829:S 26 Jul 18:00:17.629 # Connection with master lost. 8829:S 26 Jul 18:00:17.629 * Caching the disconnected master state. 8829:S 26 Jul 18:00:17.838 * Connecting to MASTER 10.10.100.76:38001 8829:S 26 Jul 18:00:17.838 * MASTER <-> SLAVE sync started 8829:S 26 Jul 18:00:17.838 # Error condition on socket for SYNC: Connection refused 8829:S 26 Jul 18:00:18.840 * Connecting to MASTER 10.10.100.76:38001 8829:S 26 Jul 18:00:18.840 * MASTER <-> SLAVE sync started 8829:S 26 Jul 18:00:18.840 # Error condition on socket for SYNC: Connection refused ........(这里有很多请求连接master的日志) 8829:M 26 Jul 18:00:47.995 * Discarding previously cached master state. 8829:M 26 Jul 18:00:47.995 * MASTER MODE enabled (user request from 'id=8 addr=10.10.100.76:59874 fd=11 name=sentinel-5c24343d-cmd age=864 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec') 8829:M 26 Jul 18:00:47.997 # CONFIG REWRITE executed with success. 8829:M 26 Jul 18:00:48.392 * Slave 10.10.100.76:38002 asks for synchronization 8829:M 26 Jul 18:00:48.392 * Full resync requested by slave 10.10.100.76:38002 8829:M 26 Jul 18:00:48.392 * Starting BGSAVE for SYNC with target: disk 8829:M 26 Jul 18:00:48.393 * Background saving started by pid 8984 8984:C 26 Jul 18:00:48.404 * DB saved on disk 8984:C 26 Jul 18:00:48.405 * RDB: 4 MB of memory used by copy-on-write 8829:M 26 Jul 18:00:48.492 * Background saving terminated with success 8829:M 26 Jul 18:00:48.492 * Synchronization with slave 10.10.100.76:38002 succeeded
redis 主从实例的配置文件变化(由于配置文件太多就不贴了):
port:38001 没变化
port:38002 修改了master的地址,添加了一句话
slaveof 10.10.100.76 38003 ...... # Generated by CONFIG REWRITE maxclients 4064
port:38003  删除了 slaveof这一句,添加了一句话
# Generated by CONFIG REWRITE maxclients 4064
结论:
1、原master在down掉后,经过一段时间sentinel的确认后,自动故障转移,原slave38003提升为master
2、经过日志观察,最早发现38001 sdown掉的是 39001,但是failover 的是39003
3、failover后,提升38003为master ,并将38001、38002 为slave ,由于38001为down状态,最后添加了一句
+sdown slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003
三、启动down掉的实例38001
启动38001
[boss@localhost src]$ ./redis-server ../conf/redis38001.conf >> ../conf/logs/redis38001.log & [4] 9086 [1] Done ./redis-server ../conf/redis38001.conf >> ../conf/logs/redis38001.log [boss@localhost src]$
redis 38001实例的配置文件变化,添加下列语句
# Generated by CONFIG REWRITE slaveof 10.10.100.76 38003 maxclients 4064
sentinel日志变化(三个都一样):
删掉原来 sdown 的实例
8869:X 26 Jul 18:38:37.414 # -sdown slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003
redis 实例(port:38001,现在为slave)
......(前面的就不复制了) 9086:M 26 Jul 18:38:37.256 # You requested maxclients of 10000 requiring at least 10032 max file descriptors. 9086:M 26 Jul 18:38:37.256 # Redis can't set maximum open files to 10032 because of OS error: Operation not permitted. 9086:M 26 Jul 18:38:37.256 # Current maximum open files is 4096. maxclients has been reduced to 4064 to compensate for low ulimit. If you need higher maxclients increase 'ulimit -n'. _._ _.-``__ ''-._ _.-`` `. `_. ''-._ Redis 3.0.7 (00000000/0) 64 bit .-`` .-```. ```\/ _.,_ ''-._ ( ' , .-` | `, ) Running in standalone mode |`-._`-...-` __...-.``-._|'` _.-'| Port: 38001 | `-._ `._ / _.-' | PID: 9086 `-._ `-._ `-./ _.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | http://redis.io `-._ `-._`-.__.-'_.-' _.-' |`-._`-._ `-.__.-' _.-'_.-'| | `-._`-._ _.-'_.-' | `-._ `-._`-.__.-'_.-' _.-' `-._ `-.__.-' _.-' `-._ _.-' `-.__.-' 9086:M 26 Jul 18:38:37.257 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. 9086:M 26 Jul 18:38:37.257 # Server started, Redis version 3.0.7 9086:M 26 Jul 18:38:37.258 # WARNING overcommit_memory is set to 0! Background save may fail under low memory condition. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect. 9086:M 26 Jul 18:38:37.258 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled. 9086:M 26 Jul 18:38:37.258 * DB loaded from disk: 0.000 seconds 9086:M 26 Jul 18:38:37.258 * The server is now ready to accept connections on port 38001 9086:S 26 Jul 18:38:47.334 * SLAVE OF 10.10.100.76:38003 enabled (user request from 'id=2 addr=10.10.100.76:54160 fd=6 name=sentinel-5c24343d-cmd age=10 idle=0 flags=x db=0 sub=0 psub=0 multi=3 qbuf=0 qbuf-free=32768 obl=36 oll=0 omem=0 events=rw cmd=exec') 9086:S 26 Jul 18:38:47.335 # CONFIG REWRITE executed with success. 9086:S 26 Jul 18:38:48.309 * Connecting to MASTER 10.10.100.76:38003 9086:S 26 Jul 18:38:48.309 * MASTER <-> SLAVE sync started 9086:S 26 Jul 18:38:48.309 * Non blocking connect for SYNC fired the event. 9086:S 26 Jul 18:38:48.309 * Master replied to PING, replication can continue... 9086:S 26 Jul 18:38:48.310 * Partial resynchronization not possible (no cached master) 9086:S 26 Jul 18:38:48.311 * Full resync from master: 0ca88bef97ff1f9dddb3985fb31db97b77f70ad0:464126 9086:S 26 Jul 18:38:48.398 * MASTER <-> SLAVE sync: receiving 51 bytes from master 9086:S 26 Jul 18:38:48.398 * MASTER <-> SLAVE sync: Flushing old data 9086:S 26 Jul 18:38:48.398 * MASTER <-> SLAVE sync: Loading DB in memory 9086:S 26 Jul 18:38:48.398 * MASTER <-> SLAVE sync: Finished with success
redis 实例(port:38002,现在为slave)
无特殊变化
redis 实例(port:38003,现在为master)
......(前面的就不复制了) 8829:M 26 Jul 18:38:48.310 * Slave 10.10.100.76:38001 asks for synchronization 8829:M 26 Jul 18:38:48.310 * Full resync requested by slave 10.10.100.76:38001 8829:M 26 Jul 18:38:48.310 * Starting BGSAVE for SYNC with target: disk 8829:M 26 Jul 18:38:48.311 * Background saving started by pid 9090 9090:C 26 Jul 18:38:48.314 * DB saved on disk 9090:C 26 Jul 18:38:48.314 * RDB: 4 MB of memory used by copy-on-write 8829:M 26 Jul 18:38:48.398 * Background saving terminated with success 8829:M 26 Jul 18:38:48.398 * Synchronization with slave 10.10.100.76:38001 succeeded
总结:
1、启动主从实例后,从实例会向主实例请求一次全同步
2、启动sentinel后,会生成一个唯一的sentinel runid,并监视master,根据master通过订阅发布,得到master的slaves,以及一切监视这个master的sentinels
3、在master被down后,会先被单个sentinel认定为sdown,经过一段时间(根据配置)后,被多个认定为odown,并选举得到去做failover的sentinel
(注意,最先发现sdown的并不一定是做failover的)
4、选举一个slave为master。
下面就sentinel 39003 的配置文件做下说明:
8875:X 26 Jul 17:46:22.721 # WARNING: The TCP backlog setting of 511 cannot be enforced because /proc/sys/net/core/somaxconn is set to the lower value of 128. --生成sentinel的runid 8875:X 26 Jul 17:46:22.721 # Sentinel runid is 5c24343d83dd1e0da6e1e511dc5dd690ee804065 --加入监视主机 主机名称为master1,host、port、quorum 都来自配置文件 8875:X 26 Jul 17:46:22.721 # +monitor master master1 10.10.100.76 38001 quorum 2 --根据主机,通过发布订阅功能,识别并关联 主机的 slaves 8875:X 26 Jul 17:46:23.722 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 8875:X 26 Jul 17:46:23.731 * +slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 --根据主机,通过发布订阅功能,识别并关联 一起监视这个主机的 sentinels 8875:X 26 Jul 17:46:24.110 * +sentinel sentinel 10.10.100.76:39001 10.10.100.76 39001 @ master1 10.10.100.76 38001 8875:X 26 Jul 17:46:25.754 * +sentinel sentinel 10.10.100.76:39002 10.10.100.76 39002 @ master1 10.10.100.76 38001 --加入主观下线状态,该sentinel sdown master 8875:X 26 Jul 18:00:47.684 # +sdown master master1 10.10.100.76 38001 --加入客观下线状态,多个sentinels odown master 8875:X 26 Jul 18:00:47.755 # +odown master master1 10.10.100.76 38001 #quorum 2/2 --生成新的纪元号 8875:X 26 Jul 18:00:47.755 # +new-epoch 1 --尝试 failover 这个master 8875:X 26 Jul 18:00:47.755 # +try-failover master master1 10.10.100.76 38001 --发起投票选取leader,并自己给 某个slave实例投票 8875:X 26 Jul 18:00:47.758 # +vote-for-leader 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 --39001投票 8875:X 26 Jul 18:00:47.761 # 10.10.100.76:39001 voted for 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 --39002投票 8875:X 26 Jul 18:00:47.761 # 10.10.100.76:39002 voted for 5c24343d83dd1e0da6e1e511dc5dd690ee804065 1 --赢得指定纪元的选举,可以进行故障迁移操作了 8875:X 26 Jul 18:00:47.859 # +elected-leader master master1 10.10.100.76 38001 --故障转移操作现在处于 select-slave 状态 —— Sentinel 正在寻找可以升级为主服务器的从服务器 8875:X 26 Jul 18:00:47.859 # +failover-state-select-slave master master1 10.10.100.76 38001 --Sentinel 顺利找到适合进行升级的从服务器 8875:X 26 Jul 18:00:47.911 # +selected-slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 --将指定的从服务器升级为主服务器,并去掉slaveof 8875:X 26 Jul 18:00:47.911 * +failover-state-send-slaveof-noone slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 --故障转移在等待升级 8875:X 26 Jul 18:00:47.995 * +failover-state-wait-promotion slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 --升级slave 8875:X 26 Jul 18:00:48.053 # +promoted-slave slave 10.10.100.76:38003 10.10.100.76 38003 @ master1 10.10.100.76 38001 --故障转移状态切换到了 reconf-slaves 状态 (重新配置配置文件) 8875:X 26 Jul 18:00:48.053 # +failover-state-reconf-slaves master master1 10.10.100.76 38001 --向实例发送了 SLAVEOF 命令,为实例设置新的主服务器 8875:X 26 Jul 18:00:48.108 * +slave-reconf-sent slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 --去掉客观下线状态 8875:X 26 Jul 18:00:48.881 # -odown master master1 10.10.100.76 38001 --重新配置进行中,相应的同步过程仍未完成 8875:X 26 Jul 18:00:49.143 * +slave-reconf-inprog slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 --从服务器已经成功完成对新主服务器的同步 8875:X 26 Jul 18:00:49.143 * +slave-reconf-done slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38001 --故障转移操作顺利完成。所有从服务器都开始复制新的主服务器 8875:X 26 Jul 18:00:49.208 # +failover-end master master1 10.10.100.76 38001 --转换master 从38001到38003 8875:X 26 Jul 18:00:49.208 # +switch-master master1 10.10.100.76 38001 10.10.100.76 38003 --加入slave到新的master 8875:X 26 Jul 18:00:49.209 * +slave slave 10.10.100.76:38002 10.10.100.76 38002 @ master1 10.10.100.76 38003 --加入slave到新的master 8875:X 26 Jul 18:00:49.209 * +slave slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003 --加入主观下线状态 8875:X 26 Jul 18:01:19.294 # +sdown slave 10.10.100.76:38001 10.10.100.76 38001 @ master1 10.10.100.76 38003
附件
下面是自动故障转移后的配置文件和日志文件:
备注:
此次测试没有启用密码验证,如果需要
就要在redis 和 sentinel 中都添加密码
masterauth <master-password>
requirepass foobared
sentinel auth-pass <master-password>

 
                    
                     
                    
                 
                    
                
 

 
                
            
         
         浙公网安备 33010602011771号
浙公网安备 33010602011771号