MHA之手动failover

在之前的博客中,记录了使用MHA自动failover的过程,这片记录下手动切换failover!

手动failover,这种场景意味着在业务上没有启用MHA自动切换功能,当主服务器故障时,人工手动调用MHA来进行故障切换操作。

现在的架构是一主两从,204作为master,179为备主,221为从服务器!因为设置了备主,所以切换时,只能从204切换到221!

现在执行手动切换

[root@test3 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.0.102.204 --dead_master_port=3306 --intervactive=1 --new_master_host=10.0.102.179 
--dead_master_ip=<dead_master_ip> is not set. Using 10.0.102.204.
Sun Dec  9 15:41:47 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Dec  9 15:41:47 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Dec  9 15:41:47 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Dec  9 15:41:47 2018 - [info] MHA::MasterFailover version 0.56.
Sun Dec  9 15:41:47 2018 - [info] Starting master failover.
Sun Dec  9 15:41:47 2018 - [info] 
Sun Dec  9 15:41:47 2018 - [info] * Phase 1: Configuration Check Phase..
Sun Dec  9 15:41:47 2018 - [info] 
Sun Dec  9 15:41:48 2018 - [info] GTID failover mode = 1
Sun Dec  9 15:41:48 2018 - [info] Dead Servers:
Sun Dec  9 15:41:48 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln187] None of server is dead. Stop failover.
Sun Dec  9 15:41:48 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53

因为MySQL集群中三台服务器是正常的,因此执行会报错!

关闭掉MySQL集群中master服务器!

[root@test2 ~]# service mysqld stop
Shutting down MySQL............ SUCCESS! 

然后执行手动切换

[root@test3 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.0.102.204 --dead_master_port=3306 --intervactive=1 --new_master_host=10.0.102.179 
--dead_master_ip=<dead_master_ip> is not set. Using 10.0.102.204.
Sun Dec  9 15:44:33 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Dec  9 15:44:33 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Dec  9 15:44:33 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Dec  9 15:44:33 2018 - [info] MHA::MasterFailover version 0.56.
Sun Dec  9 15:44:33 2018 - [info] Starting master failover.
Sun Dec  9 15:44:33 2018 - [info] 
Sun Dec  9 15:44:33 2018 - [info] * Phase 1: Configuration Check Phase..
Sun Dec  9 15:44:33 2018 - [info] 
Sun Dec  9 15:44:34 2018 - [info] GTID failover mode = 1
Sun Dec  9 15:44:34 2018 - [info] Dead Servers:
Sun Dec  9 15:44:34 2018 - [info]   10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:44:34 2018 - [info] Checking master reachability via MySQL(double check)...
Sun Dec  9 15:44:34 2018 - [info]  ok.
Sun Dec  9 15:44:34 2018 - [info] Alive Servers:
Sun Dec  9 15:44:34 2018 - [info]   10.0.102.179(10.0.102.179:3306)
Sun Dec  9 15:44:34 2018 - [info]   10.0.102.221(10.0.102.221:3306)
Sun Dec  9 15:44:34 2018 - [info] Alive Slaves:
Sun Dec  9 15:44:34 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:44:34 2018 - [info]     GTID ON
Sun Dec  9 15:44:34 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:44:34 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Dec  9 15:44:34 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:44:34 2018 - [info]     GTID ON
Sun Dec  9 15:44:34 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:44:34 2018 - [info]     Not candidate for the new Master (no_master is set)
Master 10.0.102.204(10.0.102.204:3306) is dead. Proceed? (yes/NO): yes
Sun Dec  9 15:44:36 2018 - [error][/usr/local/share/perl5/MHA/MasterFailover.pm, ln309] Last failover was done at 2018/12/09 13:11:06. Current time is too early to do failover again. If you want to do failover, manually remove /data/log/app1/app1.failover.complete and run this script again.
Sun Dec  9 15:44:36 2018 - [error][/usr/local/share/perl5/MHA/ManagerUtil.pm, ln177] Got ERROR:  at /usr/local/bin/masterha_master_switch line 53

#这里有报错同时也给出了报错的原因和解决办法
#报错解决办法
[root@test3
~]# rm -f /data/log/app1/app1.failover.complete

再执行一次手动切换,切换过程中需要交互回复“yes”

[root@test3 ~]# masterha_master_switch --master_state=dead --conf=/etc/masterha/app1.cnf --dead_master_host=10.0.102.204 --dead_master_port=3306 --intervactive=1 --new_master_host=10.0.102.179 
--dead_master_ip=<dead_master_ip> is not set. Using 10.0.102.204.
Sun Dec  9 15:45:10 2018 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Sun Dec  9 15:45:10 2018 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Sun Dec  9 15:45:10 2018 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Sun Dec  9 15:45:10 2018 - [info] MHA::MasterFailover version 0.56.
Sun Dec  9 15:45:10 2018 - [info] Starting master failover.
Sun Dec  9 15:45:10 2018 - [info] 
Sun Dec  9 15:45:10 2018 - [info] * Phase 1: Configuration Check Phase..
Sun Dec  9 15:45:10 2018 - [info] 
Sun Dec  9 15:45:10 2018 - [info] GTID failover mode = 1
Sun Dec  9 15:45:10 2018 - [info] Dead Servers:
Sun Dec  9 15:45:10 2018 - [info]   10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:45:10 2018 - [info] Checking master reachability via MySQL(double check)...
Sun Dec  9 15:45:10 2018 - [info]  ok.
Sun Dec  9 15:45:10 2018 - [info] Alive Servers:
Sun Dec  9 15:45:10 2018 - [info]   10.0.102.179(10.0.102.179:3306)
Sun Dec  9 15:45:10 2018 - [info]   10.0.102.221(10.0.102.221:3306)
Sun Dec  9 15:45:10 2018 - [info] Alive Slaves:
Sun Dec  9 15:45:10 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:45:10 2018 - [info]     GTID ON
Sun Dec  9 15:45:10 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:45:10 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Dec  9 15:45:10 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:45:10 2018 - [info]     GTID ON
Sun Dec  9 15:45:10 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:45:10 2018 - [info]     Not candidate for the new Master (no_master is set)
Master 10.0.102.204(10.0.102.204:3306) is dead. Proceed? (yes/NO): yes
Sun Dec  9 15:45:12 2018 - [info] Starting GTID based failover.
Sun Dec  9 15:45:12 2018 - [info] 
Sun Dec  9 15:45:12 2018 - [info] ** Phase 1: Configuration Check Phase completed.
Sun Dec  9 15:45:12 2018 - [info] 
Sun Dec  9 15:45:12 2018 - [info] * Phase 2: Dead Master Shutdown Phase..
Sun Dec  9 15:45:12 2018 - [info] 
Sun Dec  9 15:45:12 2018 - [info] HealthCheck: SSH to 10.0.102.204 is reachable.
Sun Dec  9 15:45:13 2018 - [info] Forcing shutdown so that applications never connect to the current master..
Sun Dec  9 15:45:13 2018 - [info] Executing master IP deactivation script:
Sun Dec  9 15:45:13 2018 - [info]   /usr/local/bin/master_ip_failover --orig_master_host=10.0.102.204 --orig_master_ip=10.0.102.204 --orig_master_port=3306 --command=stopssh --ssh_user=root  


IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.0.102.110/22===

Disabling the VIP on old master: 10.0.102.204 
Sun Dec  9 15:45:13 2018 - [info]  done.
Sun Dec  9 15:45:13 2018 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Sun Dec  9 15:45:13 2018 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Sun Dec  9 15:45:13 2018 - [info] 
Sun Dec  9 15:45:13 2018 - [info] * Phase 3: Master Recovery Phase..
Sun Dec  9 15:45:13 2018 - [info] 
Sun Dec  9 15:45:13 2018 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Sun Dec  9 15:45:13 2018 - [info] 
Sun Dec  9 15:45:13 2018 - [info] The latest binary log file/position on all slaves is master_bin.000004:194
Sun Dec  9 15:45:13 2018 - [info] Latest slaves (Slaves that received relay log files to the latest):
Sun Dec  9 15:45:13 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:45:13 2018 - [info]     GTID ON
Sun Dec  9 15:45:13 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:45:13 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Dec  9 15:45:13 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:45:13 2018 - [info]     GTID ON
Sun Dec  9 15:45:13 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:45:13 2018 - [info]     Not candidate for the new Master (no_master is set)
Sun Dec  9 15:45:13 2018 - [info] The oldest binary log file/position on all slaves is master_bin.000004:194
Sun Dec  9 15:45:13 2018 - [info] Oldest slaves:
Sun Dec  9 15:45:13 2018 - [info]   10.0.102.179(10.0.102.179:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:45:13 2018 - [info]     GTID ON
Sun Dec  9 15:45:13 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:45:13 2018 - [info]     Primary candidate for the new Master (candidate_master is set)
Sun Dec  9 15:45:13 2018 - [info]   10.0.102.221(10.0.102.221:3306)  Version=5.7.22-log (oldest major version between slaves) log-bin:enabled
Sun Dec  9 15:45:13 2018 - [info]     GTID ON
Sun Dec  9 15:45:13 2018 - [info]     Replicating from 10.0.102.204(10.0.102.204:3306)
Sun Dec  9 15:45:13 2018 - [info]     Not candidate for the new Master (no_master is set)
Sun Dec  9 15:45:13 2018 - [info] 
Sun Dec  9 15:45:13 2018 - [info] * Phase 3.3: Determining New Master Phase..
Sun Dec  9 15:45:13 2018 - [info] 
Sun Dec  9 15:45:13 2018 - [info] 10.0.102.179 can be new master.
Sun Dec  9 15:45:13 2018 - [info] New master is 10.0.102.179(10.0.102.179:3306)
Sun Dec  9 15:45:13 2018 - [info] Starting master failover..
Sun Dec  9 15:45:13 2018 - [info] 
From:
10.0.102.204(10.0.102.204:3306) (current master)
 +--10.0.102.179(10.0.102.179:3306)
 +--10.0.102.221(10.0.102.221:3306)

To:
10.0.102.179(10.0.102.179:3306) (new master)
 +--10.0.102.221(10.0.102.221:3306)

Starting master switch from 10.0.102.204(10.0.102.204:3306) to 10.0.102.179(10.0.102.179:3306)? (yes/NO): yes
Sun Dec  9 15:45:18 2018 - [info] New master decided manually is 10.0.102.179(10.0.102.179:3306)
Sun Dec  9 15:45:18 2018 - [info] 
Sun Dec  9 15:45:18 2018 - [info] * Phase 3.3: New Master Recovery Phase..
Sun Dec  9 15:45:18 2018 - [info] 
Sun Dec  9 15:45:18 2018 - [info]  Waiting all logs to be applied.. 
Sun Dec  9 15:45:18 2018 - [info]   done.
Sun Dec  9 15:45:18 2018 - [info] Getting new master's binlog name and position..
Sun Dec  9 15:45:18 2018 - [info]  test1-bin.000006:234
Sun Dec  9 15:45:18 2018 - [info]  All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.102.179', MASTER_PORT=3306, MASTER_AUTO_POSITION=1, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Sun Dec  9 15:45:18 2018 - [info] Master Recovery succeeded. File:Pos:Exec_Gtid_Set: test1-bin.000006, 234, 768604c5-f82f-11e8-85b7-fabc668d2e00:1-3,
d7f72aad-f82e-11e8-a06c-fa1dae125200:1-5
Sun Dec  9 15:45:18 2018 - [info] Executing master IP activate script:
Sun Dec  9 15:45:18 2018 - [info]   /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.0.102.204 --orig_master_ip=10.0.102.204 --orig_master_port=3306 --new_master_host=10.0.102.179 --new_master_ip=10.0.102.179 --new_master_port=3306 --new_master_user='root' --new_master_password='123456'  
Unknown option: new_master_user
Unknown option: new_master_password


IN SCRIPT TEST====/sbin/ifconfig eth0:0 down==/sbin/ifconfig eth0:0 10.0.102.110/22===

Enabling the VIP - 10.0.102.110/22 on the new master - 10.0.102.179 
Sun Dec  9 15:45:18 2018 - [info]  OK.
Sun Dec  9 15:45:18 2018 - [info] ** Finished master recovery successfully.
Sun Dec  9 15:45:18 2018 - [info] * Phase 3: Master Recovery Phase completed.
Sun Dec  9 15:45:18 2018 - [info] 
Sun Dec  9 15:45:18 2018 - [info] * Phase 4: Slaves Recovery Phase..
Sun Dec  9 15:45:18 2018 - [info] 
Sun Dec  9 15:45:18 2018 - [info] 
Sun Dec  9 15:45:18 2018 - [info] * Phase 4.1: Starting Slaves in parallel..
Sun Dec  9 15:45:18 2018 - [info] 
Sun Dec  9 15:45:18 2018 - [info] -- Slave recovery on host 10.0.102.221(10.0.102.221:3306) started, pid: 2732. Check tmp log /data/log/app1/10.0.102.221_3306_20181209154510.log if it takes time..
Sun Dec  9 15:45:19 2018 - [info] 
Sun Dec  9 15:45:19 2018 - [info] Log messages from 10.0.102.221 ...
Sun Dec  9 15:45:19 2018 - [info] 
Sun Dec  9 15:45:18 2018 - [info]  Resetting slave 10.0.102.221(10.0.102.221:3306) and starting replication from the new master 10.0.102.179(10.0.102.179:3306)..
Sun Dec  9 15:45:18 2018 - [info]  Executed CHANGE MASTER.
Sun Dec  9 15:45:19 2018 - [info]  Slave started.
Sun Dec  9 15:45:19 2018 - [info]  gtid_wait(768604c5-f82f-11e8-85b7-fabc668d2e00:1-3,
d7f72aad-f82e-11e8-a06c-fa1dae125200:1-5) completed on 10.0.102.221(10.0.102.221:3306). Executed 0 events.
Sun Dec  9 15:45:19 2018 - [info] End of log messages from 10.0.102.221.
Sun Dec  9 15:45:19 2018 - [info] -- Slave on host 10.0.102.221(10.0.102.221:3306) started.
Sun Dec  9 15:45:19 2018 - [info] All new slave servers recovered successfully.
Sun Dec  9 15:45:19 2018 - [info] 
Sun Dec  9 15:45:19 2018 - [info] * Phase 5: New master cleanup phase..
Sun Dec  9 15:45:19 2018 - [info] 
Sun Dec  9 15:45:19 2018 - [info] Resetting slave info on the new master..
Sun Dec  9 15:45:19 2018 - [info]  10.0.102.179: Resetting slave info succeeded.
Sun Dec  9 15:45:19 2018 - [info] Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully.
Sun Dec  9 15:45:19 2018 - [info] 

----- Failover Report -----

app1: MySQL Master failover 10.0.102.204(10.0.102.204:3306) to 10.0.102.179(10.0.102.179:3306) succeeded

Master 10.0.102.204(10.0.102.204:3306) is down!

Check MHA Manager logs at test3 for details.

Started manual(interactive) failover.
Invalidated master IP address on 10.0.102.204(10.0.102.204:3306)
Selected 10.0.102.179(10.0.102.179:3306) as a new master.
10.0.102.179(10.0.102.179:3306): OK: Applying all logs succeeded.
10.0.102.179(10.0.102.179:3306): OK: Activated master IP address.
10.0.102.221(10.0.102.221:3306): OK: Slave started, replicating from 10.0.102.179(10.0.102.179:3306)
10.0.102.179(10.0.102.179:3306): Resetting slave info succeeded.
Master failover to 10.0.102.179(10.0.102.179:3306) completed successfully.
Sun Dec  9 15:45:19 2018 - [info] Sending mail..
[root@test3 ~]# 

上面这个过程其实和自动切换过程是差不多的!

在这里因为原始的主和从,三台服务器数据是一样的,因此并没有发生补全中继日志的情况。

这样一种情况,我们选择的主的数据可能不是最新的数据,这是再进行切换时,就需要拉取差异日志,进行补全,这个过程不需要手动操作,但是若是日志比较大,则这个过程可能会非常慢,因为在配置文件的每组服务器上加上check_repl_delay=0参数,表示忽略复制延迟!

 

http://www.ywnds.com/?p=8249

 

posted @ 2018-12-09 17:10  夜间独行的浪子  阅读(720)  评论(0)    收藏  举报