MHA之手动failover
在之前的博客中,记录了使用MHA自动failover的过程,这片记录下手动切换failover!
手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA来进行故障切换操作。
现在的架构是一主两从,204作为master,179为备主,221为从服务器!因为设置了备主,所以切换时,只能从204切换到221!
现在执行手动切换
[root@test3 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.0.102.204 --dead_master_port=3306 --intervactive=1 --new_master_host=10.0.102.179 --dead_master_ip=<dead_master_ip> is not set. Using 10.0.102.204. Sun Dec 9 15:41:47 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Dec 9 15:41:47 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Dec 9 15:41:47 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Dec 9 15:41:47 2018 - [info] MHA::MasterFailover version 0.56. Sun Dec 9 15:41:47 2018 - [info] Starting master failover. Sun Dec 9 15:41:47 2018 - [info] Sun Dec 9 15:41:47 2018 - [info] * Phase 1: Configuration Check Phase.. Sun Dec 9 15:41:47 2018 - [info] Sun Dec 9 15:41:48 2018 - [info] GTID failover mode = 1 Sun Dec 9 15:41:48 2018 - [info] Dead Servers: Sun Dec 9 15:41:48 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover. Sun Dec 9 15:41:48 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_switch line 53
因为MySQL集群中三台服务器是正常的,因此执行会报错!
关闭掉MySQL集群中master服务器!
[root@test2 ~]# service mysqld stop
Shutting down MySQL............ SUCCESS!
然后执行手动切换
[root@test3 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.0.102.204 --dead_master_port=3306 --intervactive=1 --new_master_host=10.0.102.179 --dead_master_ip=<dead_master_ip> is not set. Using 10.0.102.204. Sun Dec 9 15:44:33 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Dec 9 15:44:33 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Dec 9 15:44:33 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Dec 9 15:44:33 2018 - [info] MHA::MasterFailover version 0.56. Sun Dec 9 15:44:33 2018 - [info] Starting master failover. Sun Dec 9 15:44:33 2018 - [info] Sun Dec 9 15:44:33 2018 - [info] * Phase 1: Configuration Check Phase.. Sun Dec 9 15:44:33 2018 - [info] Sun Dec 9 15:44:34 2018 - [info] GTID failover mode = 1 Sun Dec 9 15:44:34 2018 - [info] Dead Servers: Sun Dec 9 15:44:34 2018 - [info] 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:44:34 2018 - [info] Checking master reachability via MySQL(double check)... Sun Dec 9 15:44:34 2018 - [info] ok. Sun Dec 9 15:44:34 2018 - [info] Alive Servers: Sun Dec 9 15:44:34 2018 - [info] 10.0.102.179(10.0.102.179:3306) Sun Dec 9 15:44:34 2018 - [info] 10.0.102.221(10.0.102.221:3306) Sun Dec 9 15:44:34 2018 - [info] Alive Slaves: Sun Dec 9 15:44:34 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:44:34 2018 - [info] GTID ON Sun Dec 9 15:44:34 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:44:34 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sun Dec 9 15:44:34 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:44:34 2018 - [info] GTID ON Sun Dec 9 15:44:34 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:44:34 2018 - [info] Not candidate for the new Master (no_master is set) Master 10.0.102.204(10.0.102.204:3306) is dead. Proceed? (yes/NO): yes Sun Dec 9 15:44:36 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln309] Last failover was done at 2018/12/09 13:11:06. Current time is too early to do failover again. If you want to do failover, manually remove /data/log/app1/app1.failover.complete and run this script again. Sun Dec 9 15:44:36 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR: at /usr/local/bin/masterha_master_switch line 53
#这里有报错同时也给出了报错的原因和解决办法
#报错解决办法
[root@test3 ~]# rm -f /data/log/app1/app1.failover.complete
再执行一次手动切换,切换过程中需要交互回复“yes”
[root@test3 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.0.102.204 --dead_master_port=3306 --intervactive=1 --new_master_host=10.0.102.179 --dead_master_ip=<dead_master_ip> is not set. Using 10.0.102.204. Sun Dec 9 15:45:10 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Sun Dec 9 15:45:10 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Sun Dec 9 15:45:10 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Sun Dec 9 15:45:10 2018 - [info] MHA::MasterFailover version 0.56. Sun Dec 9 15:45:10 2018 - [info] Starting master failover. Sun Dec 9 15:45:10 2018 - [info] Sun Dec 9 15:45:10 2018 - [info] * Phase 1: Configuration Check Phase.. Sun Dec 9 15:45:10 2018 - [info] Sun Dec 9 15:45:10 2018 - [info] GTID failover mode = 1 Sun Dec 9 15:45:10 2018 - [info] Dead Servers: Sun Dec 9 15:45:10 2018 - [info] 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:45:10 2018 - [info] Checking master reachability via MySQL(double check)... Sun Dec 9 15:45:10 2018 - [info] ok. Sun Dec 9 15:45:10 2018 - [info] Alive Servers: Sun Dec 9 15:45:10 2018 - [info] 10.0.102.179(10.0.102.179:3306) Sun Dec 9 15:45:10 2018 - [info] 10.0.102.221(10.0.102.221:3306) Sun Dec 9 15:45:10 2018 - [info] Alive Slaves: Sun Dec 9 15:45:10 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:45:10 2018 - [info] GTID ON Sun Dec 9 15:45:10 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:45:10 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sun Dec 9 15:45:10 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:45:10 2018 - [info] GTID ON Sun Dec 9 15:45:10 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:45:10 2018 - [info] Not candidate for the new Master (no_master is set) Master 10.0.102.204(10.0.102.204:3306) is dead. Proceed? (yes/NO): yes Sun Dec 9 15:45:12 2018 - [info] Starting GTID based failover. Sun Dec 9 15:45:12 2018 - [info] Sun Dec 9 15:45:12 2018 - [info] ** Phase 1: Configuration Check Phase completed. Sun Dec 9 15:45:12 2018 - [info] Sun Dec 9 15:45:12 2018 - [info] * Phase 2: Dead Master Shutdown Phase.. Sun Dec 9 15:45:12 2018 - [info] Sun Dec 9 15:45:12 2018 - [info] HealthCheck: SSH to 10.0.102.204 is reachable. Sun Dec 9 15:45:13 2018 - [info] Forcing shutdown so that applications never connect to the current master.. Sun Dec 9 15:45:13 2018 - [info] Executing master IP deactivation script: Sun Dec 9 15:45:13 2018 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.0.102.204 --orig_master_ip=10.0.102.204 --orig_master_port=3306 --command=stopssh --ssh_user=root IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.0.102.110/22=== Disabling the VIP on old master: 10.0.102.204 Sun Dec 9 15:45:13 2018 - [info] done. Sun Dec 9 15:45:13 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master. Sun Dec 9 15:45:13 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed. Sun Dec 9 15:45:13 2018 - [info] Sun Dec 9 15:45:13 2018 - [info] * Phase 3: Master Recovery Phase.. Sun Dec 9 15:45:13 2018 - [info] Sun Dec 9 15:45:13 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase.. Sun Dec 9 15:45:13 2018 - [info] Sun Dec 9 15:45:13 2018 - [info] The latest binary log file/position on all slaves is master_bin.000004:194 Sun Dec 9 15:45:13 2018 - [info] Latest slaves (Slaves that received relay log files to the latest): Sun Dec 9 15:45:13 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:45:13 2018 - [info] GTID ON Sun Dec 9 15:45:13 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:45:13 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sun Dec 9 15:45:13 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:45:13 2018 - [info] GTID ON Sun Dec 9 15:45:13 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:45:13 2018 - [info] Not candidate for the new Master (no_master is set) Sun Dec 9 15:45:13 2018 - [info] The oldest binary log file/position on all slaves is master_bin.000004:194 Sun Dec 9 15:45:13 2018 - [info] Oldest slaves: Sun Dec 9 15:45:13 2018 - [info] 10.0.102.179(10.0.102.179:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:45:13 2018 - [info] GTID ON Sun Dec 9 15:45:13 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:45:13 2018 - [info] Primary candidate for the new Master (candidate_master is set) Sun Dec 9 15:45:13 2018 - [info] 10.0.102.221(10.0.102.221:3306) Version=5.7.22-log (oldest major version between slaves) log-bin:enabled Sun Dec 9 15:45:13 2018 - [info] GTID ON Sun Dec 9 15:45:13 2018 - [info] Replicating from 10.0.102.204(10.0.102.204:3306) Sun Dec 9 15:45:13 2018 - [info] Not candidate for the new Master (no_master is set) Sun Dec 9 15:45:13 2018 - [info] Sun Dec 9 15:45:13 2018 - [info] * Phase 3.3: Determining New Master Phase.. Sun Dec 9 15:45:13 2018 - [info] Sun Dec 9 15:45:13 2018 - [info] 10.0.102.179 can be new master. Sun Dec 9 15:45:13 2018 - [info] New master is 10.0.102.179(10.0.102.179:3306) Sun Dec 9 15:45:13 2018 - [info] Starting master failover.. Sun Dec 9 15:45:13 2018 - [info] From: 10.0.102.204(10.0.102.204:3306) (current master) +--10.0.102.179(10.0.102.179:3306) +--10.0.102.221(10.0.102.221:3306) To: 10.0.102.179(10.0.102.179:3306) (new master) +--10.0.102.221(10.0.102.221:3306) Starting master switch from 10.0.102.204(10.0.102.204:3306) to 10.0.102.179(10.0.102.179:3306)? (yes/NO): yes Sun Dec 9 15:45:18 2018 - [info] New master decided manually is 10.0.102.179(10.0.102.179:3306) Sun Dec 9 15:45:18 2018 - [info] Sun Dec 9 15:45:18 2018 - [info] * Phase 3.3: New Master Recovery Phase.. Sun Dec 9 15:45:18 2018 - [info] Sun Dec 9 15:45:18 2018 - [info] Waiting all logs to be applied.. Sun Dec 9 15:45:18 2018 - [info] done. Sun Dec 9 15:45:18 2018 - [info] Getting new master's binlog name and position.. Sun Dec 9 15:45:18 2018 - [info] test1-bin.000006:234 Sun Dec 9 15:45:18 2018 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.102.179', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx'; Sun Dec 9 15:45:18 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: test1-bin.000006, 234, 768604c5-f82f-11e8-85b7-fabc668d2e00:1-3, d7f72aad-f82e-11e8-a06c-fa1dae125200:1-5 Sun Dec 9 15:45:18 2018 - [info] Executing master IP activate script: Sun Dec 9 15:45:18 2018 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.0.102.204 --orig_master_ip=10.0.102.204 --orig_master_port=3306 --new_master_host=10.0.102.179 --new_master_ip=10.0.102.179 --new_master_port=3306 --new_master_user='root' --new_master_password='123456' Unknown option: new_master_user Unknown option: new_master_password IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.0.102.110/22=== Enabling the VIP - 10.0.102.110/22 on the new master - 10.0.102.179 Sun Dec 9 15:45:18 2018 - [info] OK. Sun Dec 9 15:45:18 2018 - [info] ** Finished master recovery successfully. Sun Dec 9 15:45:18 2018 - [info] * Phase 3: Master Recovery Phase completed. Sun Dec 9 15:45:18 2018 - [info] Sun Dec 9 15:45:18 2018 - [info] * Phase 4: Slaves Recovery Phase.. Sun Dec 9 15:45:18 2018 - [info] Sun Dec 9 15:45:18 2018 - [info] Sun Dec 9 15:45:18 2018 - [info] * Phase 4.1: Starting Slaves in parallel.. Sun Dec 9 15:45:18 2018 - [info] Sun Dec 9 15:45:18 2018 - [info] -- Slave recovery on host 10.0.102.221(10.0.102.221:3306) started, pid: 2732. Check tmp log /data/log/app1/10.0.102.221_3306_20181209154510.log if it takes time.. Sun Dec 9 15:45:19 2018 - [info] Sun Dec 9 15:45:19 2018 - [info] Log messages from 10.0.102.221 ... Sun Dec 9 15:45:19 2018 - [info] Sun Dec 9 15:45:18 2018 - [info] Resetting slave 10.0.102.221(10.0.102.221:3306) and starting replication from the new master 10.0.102.179(10.0.102.179:3306).. Sun Dec 9 15:45:18 2018 - [info] Executed CHANGE MASTER. Sun Dec 9 15:45:19 2018 - [info] Slave started. Sun Dec 9 15:45:19 2018 - [info] gtid_wait(768604c5-f82f-11e8-85b7-fabc668d2e00:1-3, d7f72aad-f82e-11e8-a06c-fa1dae125200:1-5) completed on 10.0.102.221(10.0.102.221:3306). Executed 0 events. Sun Dec 9 15:45:19 2018 - [info] End of log messages from 10.0.102.221. Sun Dec 9 15:45:19 2018 - [info] -- Slave on host 10.0.102.221(10.0.102.221:3306) started. Sun Dec 9 15:45:19 2018 - [info] All new slave servers recovered successfully. Sun Dec 9 15:45:19 2018 - [info] Sun Dec 9 15:45:19 2018 - [info] * Phase 5: New master cleanup phase.. Sun Dec 9 15:45:19 2018 - [info] Sun Dec 9 15:45:19 2018 - [info] Resetting slave info on the new master.. Sun Dec 9 15:45:19 2018 - [info] 10.0.102.179: Resetting slave info succeeded. Sun Dec 9 15:45:19 2018 - [info] Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully. Sun Dec 9 15:45:19 2018 - [info] ----- Failover Report ----- app1: MySQL Master failover 10.0.102.204(10.0.102.204:3306) to 10.0.102.179(10.0.102.179:3306) succeeded Master 10.0.102.204(10.0.102.204:3306) is down! Check MHA Manager logs at test3 for details. Started manual(interactive) failover. Invalidated master IP address on 10.0.102.204(10.0.102.204:3306) Selected 10.0.102.179(10.0.102.179:3306) as a new master. 10.0.102.179(10.0.102.179:3306): OK: Applying all logs succeeded. 10.0.102.179(10.0.102.179:3306): OK: Activated master IP address. 10.0.102.221(10.0.102.221:3306): OK: Slave started, replicating from 10.0.102.179(10.0.102.179:3306) 10.0.102.179(10.0.102.179:3306): Resetting slave info succeeded. Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully. Sun Dec 9 15:45:19 2018 - [info] Sending mail.. [root@test3 ~]#
上面这个过程其实和自动切换过程是差不多的!
在这里因为原始的主和从,三台服务器数据是一样的,因此并没有发生补全中继日志的情况。
这样一种情况,我们选择的主的数据可能不是最新的数据,这是再进行切换时,就需要拉取差异日志,进行补全,这个过程不需要手动操作,但是若是日志比较大,则这个过程可能会非常慢,因为在配置文件的每组服务器上加上check_repl_delay=0参数,表示忽略复制延迟!
http://www.ywnds.com/?p=8249
浙公网安备 33010602011771号