day10-05-MHA高可用功能应用
MHA FailOver 过程详解
什么是FailOver?
故障转移
MHA只是针对物理故障做转移处理,不监测逻辑故障。
主库宕机一直到业务恢复正常的处理过程。
FailOver 让你实现怎么做?
1)快速监控到主库宕机
2)选择新主
3)数据补偿
4)接触从库身份(reset slave all;)
5)剩余从库和新主构建主从关系。
6)应用透明(vip 漂移)
7)故障节点自愈(待开发.....)
8)故障提醒(mail)
MHA的FailOver如何实现?
从启动---》 故障 ---》 转移 ---》业务恢复
1) MHA启动masterha_manager脚本启动MHA功能
2)在manager启动之前,会自动检查SSH(masterha_check_ssh)互信和主从(masterha_check_repl)状态脚本
3)MHA-manager 通过masterha_master_monitor脚本(每隔ping_interval秒)
4)masterha_master_monitor脚本探测主库3次无心跳之后,就认为主库宕机。
5)进行选主的过程
选主算法有三种:
算法一:首先读取配置文件中是否有强制选主参数 `candidate_master=1` ,如果有,则不进行选主算法,直接由指定节点获选。
`check_repl_delay=0` 这个参数是检查日志差异量是否很多,如果很多,则进行选主算法。0为不检查
算法二:自动判断所有从库日质量。降最接近主库数据的从库作为新主。
算法三:按照配置文件先后顺序进行选新主。
扩展一下:
`candidate_master=1` 应用场景?
1)MHA + keepAlived(早起MHA架构)
2)多地多中心
6)数据补偿
1.判断主库ssh连通性
情况一:SSH能连
调用 save_binary_logs脚本,立即保存缺失部分的binlog到各个从节点,并恢复。
情况二:ssh无法链接
调用apply_diff_relay_logs脚本,计算从库relay-log差异,恢复到2号从库。
提供额外数据补偿(没实现)
7)解除从库身份
8)剩余从库和新主库构建主从关系
9)应用透明(vip漂移)
MHA 应用透明(VIP)
db03操作:
#VIP failvoer 脚本
[root@mysql-node03 ~]# cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
use MHA::DBHelper;
my (
$command, $ssh_user, $orig_master_host,
$orig_master_ip, $orig_master_port, $new_master_host,
$new_master_ip, $new_master_port, $new_master_user,
$new_master_password
);
my $vip = '10.0.50.69/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens3:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens3:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
'new_master_user=s' => \$new_master_user,
'new_master_password=s' => \$new_master_password,
);
exit &main();
sub main {
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
# updating global catalog, etc
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@mysql-node03 ~]#
#############################
cp /root/master_ip_failover.txt /usr/local/bin/master_ip_failover
vi /usr/local/bin/master_ip_failover
my $vip = '10.0.50.69/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens3:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens3:$key down";
# $key = "1" 就是 ens3:1
在配置文件中添加 指定加载脚本
[root@mysql-node03 ~]# cat /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager
manager_workdir=/var/log/mha/app1
master_binlog_dir=/data/mysql/binlog
master_ip_failover_script=/usr/loca/bin/master_ip_failover
user=mha
password=mha
ping_interval=2
repl_password=repl
repl_user=repl
ssh_user=root
[server1]
hostname=10.0.50.61
port=3306
[server2]
hostname=10.0.50.62
port=3306
[server3]
hostname=10.0.50.63
port=3306
[root@mysql-node03 ~]#
这个脚本,只有在主库宕机的时候,才会被调用。
所以第一次的话,需要在主库手工把VIP地址 开起来
确定主库ip:
[root@mysql-node03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf
app1 (pid:14369) is running(0:PING_OK), master:10.0.50.61
[root@mysql-node03 ~]#
db01:
手工添加VIP:
[root@mysql-node01 ~]# ifconfig ens3:1 10.0.50.69/24
[root@mysql-node01 ~]#
[root@mysql-node01 ~]# ifconfig ens3:1
ens3:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 10.0.50.69 netmask 255.255.255.0 broadcast 10.0.50.255
ether 52:54:00:c5:82:26 txqueuelen 1000 (Ethernet)
[root@mysql-node01 ~]#
db03:
重启MHA
[root@mysql-node03 ~]# masterha_stop --conf=/etc/mha/app1.cnf
Stopped app1 successfully.
[1]+ Exit 1 nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1
[root@mysql-node03 ~]#
[root@mysql-node03 ~]#
[root@mysql-node03 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
[1] 17413
[root@mysql-node03 ~]#
查看MHA状态:
[root@mysql-node03 ~]# tail -n 20 /var/log/mha/app1/manager
Mon Oct 19 14:37:17 2020 - [info] Checking replication filtering settings..
Mon Oct 19 14:37:17 2020 - [info] binlog_do_db= , binlog_ignore_db=
Mon Oct 19 14:37:17 2020 - [info] Replication filtering check ok.
Mon Oct 19 14:37:17 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct 19 14:37:17 2020 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct 19 14:37:17 2020 - [info] HealthCheck: SSH to 10.0.50.61 is reachable.
Mon Oct 19 14:37:17 2020 - [info]
10.0.50.61(10.0.50.61:3306) (current master)
+--10.0.50.62(10.0.50.62:3306)
+--10.0.50.63(10.0.50.63:3306)
Mon Oct 19 14:37:17 2020 - [info] Checking master_ip_failover_script status:
Mon Oct 19 14:37:17 2020 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.50.61 --orig_master_ip=10.0.50.61 --orig_master_port=3306
Checking the Status of the script.. OK
Mon Oct 19 14:37:17 2020 - [info] OK.
Mon Oct 19 14:37:17 2020 - [warning] shutdown_script is not defined.
Mon Oct 19 14:37:17 2020 - [info] Set master ping interval 2 seconds.
Mon Oct 19 14:37:17 2020 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Oct 19 14:37:17 2020 - [info] Starting ping health check on 10.0.50.61(10.0.50.61:3306)..
Mon Oct 19 14:37:17 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
[root@mysql-node03 ~]#
邮件通知
cat /usr/local/bin/send_report
#!/usr/bin/perl
# Copyright (C) 2011 DeNA Co.,Ltd.
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc.,
# 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
## Note: This is a sample script and is not complete. Modify the script based on your environment.
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
my $smtp='smtp.163.com';
my $mail_from='from@163.com';
my $mail_user='from@163.com';
my $mail_pass='password';
#my $mail_to=['to1@qq.com','to2@qq.com'];
my $mail_to='to@qq.com';
GetOptions(
'orig_master_host=s' => \$dead_master_host,
'new_master_host=s' => \$new_master_host,
'new_slave_hosts=s' => \$new_slave_hosts,
'subject=s' => \$subject,
'body=s' => \$body,
);
# Do whatever you want here
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
sub mailToContacts {
my ($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_;
open my $DEBUG, ">/var/log/masterha/app1/mail.log"
or die "Can't open the debug file:$!\n";
my $sender = new Mail::Sender {
ctype => 'text/plain;charset=utf-8',
encoding => 'utf-8',
smtp => $smtp,
from => $mail_from,
auth => 'LOGIN',
TLS_allowed => '0',
authid => $mail_user,
authpwd => $mail_pass,
to => $mail_to,
subject => $subject,
debug => $DEBUG
};
$sender->MailMsg(
{
msg => $msg,
debug => $DEBUG
}
) or print $Mail::Sender::Error;
return 1;
}
exit 0;
然后修改配置文件,只需添加report_script即可
[server default]
report_script=/usr/local/bin/send_report
最后开启mha监控,停止master来触发failover,在最后可以看到生成了failover报告并send_report。
打开邮箱查看邮件。
额外的数据补偿(binlog server)
找一台额外的机器,必须要5.6以上版本,支持GTID并开启,在这里直接使用第二个slave,也就是db03节点。
vi /etc/mha/app1.cnf
[binlog1]
no_master=1 #不参与选主
hostname=10.0.50.63
master_binlog_dir=/data/mysql/mha/binlog_server #不能和主节点binlog目录一样。
创建必要目录
[root@mysql-node03 ~]# mkdir -p /data/mysql/mha/binlog_server
[root@mysql-node03 ~]# chown -R mysql. /data/mysql/mha
拉取主库binlog日志:
注意:拉取日志的起点,需要按照目前主库当前正在使用的binlog为起点。
## db01 查看主库 初始binlog 文件号
mysql> show binary logs;
+------------------+-----------+
| Log_name | File_size |
+------------------+-----------+
| mysql-bin.000018 | 177 |
| mysql-bin.000019 | 1188 |
+------------------+-----------+
2 rows in set (0.00 sec)
mysql>
mysql> show master status\G
*************************** 1. row ***************************
File: mysql-bin.000019
Position: 1188
Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set: b4dfda8a-0f89-11eb-a610-525400c58226:1-5
1 row in set (0.00 sec)
mysql>
## db03 拉取binlog
[root@mysql-node03 binlog_server]# mysqlbinlog -R --host=10.0.50.61 --user=mha --password=mha --raw --stop-never mysql-bin.000019 &
[2] 18789
[root@mysql-node03 binlog_server]# mysqlbinlog: [Warning] Using a password on the command line interface can be insecure.
[root@mysql-node03 binlog_server]#
[root@mysql-node03 binlog_server]#
[root@mysql-node03 binlog_server]#
[root@mysql-node03 binlog_server]# ls
mysql-bin.000019
[root@mysql-node03 binlog_server]#
重启MHA-manager。
重启后,MHA具备一旦SSH链接失败,就能通过读取binlog-server内的日志进行数据补偿

浙公网安备 33010602011771号