day10-05-MHA高可用功能应用

MHA FailOver 过程详解

什么是FailOver?

故障转移

MHA只是针对物理故障做转移处理,不监测逻辑故障。

主库宕机一直到业务恢复正常的处理过程。

FailOver 让你实现怎么做?

1)快速监控到主库宕机

2)选择新主

3)数据补偿

4)接触从库身份(reset slave all;)

5)剩余从库和新主构建主从关系。

6)应用透明(vip 漂移)

7)故障节点自愈(待开发.....)

8)故障提醒(mail)

MHA的FailOver如何实现?

从启动---》 故障 ---》 转移 ---》业务恢复

1) MHA启动masterha_manager脚本启动MHA功能

2)在manager启动之前,会自动检查SSH(masterha_check_ssh)互信和主从(masterha_check_repl)状态脚本

3)MHA-manager 通过masterha_master_monitor脚本(每隔ping_interval秒)

4)masterha_master_monitor脚本探测主库3次无心跳之后,就认为主库宕机。

5)进行选主的过程

	选主算法有三种:

	算法一:首先读取配置文件中是否有强制选主参数  `candidate_master=1`  ,如果有,则不进行选主算法,直接由指定节点获选。

	`check_repl_delay=0`  这个参数是检查日志差异量是否很多,如果很多,则进行选主算法。0为不检查



	算法二:自动判断所有从库日质量。降最接近主库数据的从库作为新主。

	算法三:按照配置文件先后顺序进行选新主。

扩展一下:

		`candidate_master=1`   应用场景?

		1)MHA + keepAlived(早起MHA架构)

		2)多地多中心

6)数据补偿

 1.判断主库ssh连通性

	情况一:SSH能连

	调用 save_binary_logs脚本,立即保存缺失部分的binlog到各个从节点,并恢复。

	情况二:ssh无法链接

	调用apply_diff_relay_logs脚本,计算从库relay-log差异,恢复到2号从库。

提供额外数据补偿(没实现)

7)解除从库身份

8)剩余从库和新主库构建主从关系

9)应用透明(vip漂移)


MHA 应用透明(VIP)

db03操作:

#VIP failvoer 脚本

[root@mysql-node03 ~]# cat /usr/local/bin/master_ip_failover
#!/usr/bin/env perl

#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA

## Note: This is a sample script and is not complete. Modify the script based on your environment.

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
my $vip = '10.0.50.69/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens3:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens3:$key down";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {

    # $orig_master_host, $orig_master_ip, $orig_master_port are passed.
    # If you manage master ip address at global catalog database,
    # invalidate orig_master_ip here.
    my $exit_code = 1;
    eval {

      # updating global catalog, etc
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
    elsif ( $command eq "start" ) {

        # all arguments are passed.
        # If you manage master ip address at global catalog database,
        # activate new_master_ip here.
        # You can also grant write access (create user, set read_only=0, etc) here.
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}


sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master 
sub stop_vip() {
   `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}


sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
[root@mysql-node03 ~]#

#############################

cp /root/master_ip_failover.txt /usr/local/bin/master_ip_failover

vi /usr/local/bin/master_ip_failover

my $vip = '10.0.50.69/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig ens3:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig ens3:$key down";

# $key = "1"  就是 ens3:1

在配置文件中添加 指定加载脚本

[root@mysql-node03 ~]# cat /etc/mha/app1.cnf
[server default]
manager_log=/var/log/mha/app1/manager        
manager_workdir=/var/log/mha/app1            
master_binlog_dir=/data/mysql/binlog
master_ip_failover_script=/usr/loca/bin/master_ip_failover
user=mha                                   
password=mha                               
ping_interval=2
repl_password=repl
repl_user=repl
ssh_user=root                               
[server1]                                   
hostname=10.0.50.61
port=3306                                  
[server2]            
hostname=10.0.50.62
port=3306
[server3]
hostname=10.0.50.63
port=3306
[root@mysql-node03 ~]#

这个脚本,只有在主库宕机的时候,才会被调用。

所以第一次的话,需要在主库手工把VIP地址 开起来

确定主库ip:

[root@mysql-node03 ~]# masterha_check_status --conf=/etc/mha/app1.cnf 
app1 (pid:14369) is running(0:PING_OK), master:10.0.50.61
[root@mysql-node03 ~]#

db01:

手工添加VIP:

[root@mysql-node01 ~]# ifconfig ens3:1 10.0.50.69/24
[root@mysql-node01 ~]# 
[root@mysql-node01 ~]# ifconfig ens3:1
ens3:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 10.0.50.69  netmask 255.255.255.0  broadcast 10.0.50.255
        ether 52:54:00:c5:82:26  txqueuelen 1000  (Ethernet)

[root@mysql-node01 ~]#

db03:

重启MHA

[root@mysql-node03 ~]# masterha_stop --conf=/etc/mha/app1.cnf 
Stopped app1 successfully.
[1]+  Exit 1                  nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover < /dev/null > /var/log/mha/app1/manager.log 2>&1
[root@mysql-node03 ~]# 
[root@mysql-node03 ~]# 
[root@mysql-node03 ~]# nohup masterha_manager --conf=/etc/mha/app1.cnf --remove_dead_master_conf --ignore_last_failover  < /dev/null> /var/log/mha/app1/manager.log 2>&1 &
[1] 17413
[root@mysql-node03 ~]#

查看MHA状态:

[root@mysql-node03 ~]# tail -n 20 /var/log/mha/app1/manager
Mon Oct 19 14:37:17 2020 - [info] Checking replication filtering settings..
Mon Oct 19 14:37:17 2020 - [info]  binlog_do_db= , binlog_ignore_db= 
Mon Oct 19 14:37:17 2020 - [info]  Replication filtering check ok.
Mon Oct 19 14:37:17 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking.
Mon Oct 19 14:37:17 2020 - [info] Checking SSH publickey authentication settings on the current master..
Mon Oct 19 14:37:17 2020 - [info] HealthCheck: SSH to 10.0.50.61 is reachable.
Mon Oct 19 14:37:17 2020 - [info] 
10.0.50.61(10.0.50.61:3306) (current master)
 +--10.0.50.62(10.0.50.62:3306)
 +--10.0.50.63(10.0.50.63:3306)

Mon Oct 19 14:37:17 2020 - [info] Checking master_ip_failover_script status:
Mon Oct 19 14:37:17 2020 - [info]   /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.50.61 --orig_master_ip=10.0.50.61 --orig_master_port=3306 
Checking the Status of the script.. OK 
Mon Oct 19 14:37:17 2020 - [info]  OK.
Mon Oct 19 14:37:17 2020 - [warning] shutdown_script is not defined.
Mon Oct 19 14:37:17 2020 - [info] Set master ping interval 2 seconds.
Mon Oct 19 14:37:17 2020 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Mon Oct 19 14:37:17 2020 - [info] Starting ping health check on 10.0.50.61(10.0.50.61:3306)..
Mon Oct 19 14:37:17 2020 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
[root@mysql-node03 ~]# 

邮件通知

cat /usr/local/bin/send_report
#!/usr/bin/perl
 
#  Copyright (C) 2011 DeNA Co.,Ltd.
#
#  This program is free software; you can redistribute it and/or modify
#  it under the terms of the GNU General Public License as published by
#  the Free Software Foundation; either version 2 of the License, or
#  (at your option) any later version.
#
#  This program is distributed in the hope that it will be useful,
#  but WITHOUT ANY WARRANTY; without even the implied warranty of
#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#  GNU General Public License for more details.
#
#  You should have received a copy of the GNU General Public License
#   along with this program; if not, write to the Free Software
#  Foundation, Inc.,
#  51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
 
## Note: This is a sample script and is not complete. Modify the script based on your environment.
 
use strict;
use warnings FATAL => 'all';
use Mail::Sender;
use Getopt::Long;
 
#new_master_host and new_slave_hosts are set only when recovering master succeeded
my ( $dead_master_host, $new_master_host, $new_slave_hosts, $subject, $body );
 
my $smtp='smtp.163.com';
my $mail_from='from@163.com';
my $mail_user='from@163.com';
my $mail_pass='password';
#my $mail_to=['to1@qq.com','to2@qq.com'];
my $mail_to='to@qq.com';
 
GetOptions(
  'orig_master_host=s' => \$dead_master_host,
  'new_master_host=s'  => \$new_master_host,
  'new_slave_hosts=s'  => \$new_slave_hosts,
  'subject=s'          => \$subject,
  'body=s'             => \$body,
);
 
# Do whatever you want here
mailToContacts($smtp,$mail_from,$mail_user,$mail_pass,$mail_to,$subject,$body);
 
sub mailToContacts {
	my ($smtp, $mail_from, $mail_user, $mail_pass, $mail_to, $subject, $msg ) = @_;
	open my $DEBUG, ">/var/log/masterha/app1/mail.log"
		or die "Can't open the debug	file:$!\n";
	my $sender = new Mail::Sender {
		ctype		=> 'text/plain;charset=utf-8',
		encoding	=> 'utf-8',
		smtp		=> $smtp,
		from		=> $mail_from,
		auth		=> 'LOGIN',
		TLS_allowed	=> '0',
		authid		=> $mail_user,
		authpwd		=> $mail_pass,
		to		=> $mail_to,
		subject		=> $subject,
		debug		=> $DEBUG
	};
	$sender->MailMsg(
		{
			msg => $msg,
			debug => $DEBUG
		}
	) or print $Mail::Sender::Error;
	return 1;
}
 
exit 0;

然后修改配置文件,只需添加report_script即可

[server default]
report_script=/usr/local/bin/send_report

最后开启mha监控,停止master来触发failover,在最后可以看到生成了failover报告并send_report。

打开邮箱查看邮件。


额外的数据补偿(binlog server)

找一台额外的机器,必须要5.6以上版本,支持GTID并开启,在这里直接使用第二个slave,也就是db03节点。

vi /etc/mha/app1.cnf
[binlog1]
no_master=1	#不参与选主
hostname=10.0.50.63
master_binlog_dir=/data/mysql/mha/binlog_server	#不能和主节点binlog目录一样。

创建必要目录

[root@mysql-node03 ~]# mkdir -p /data/mysql/mha/binlog_server
[root@mysql-node03 ~]# chown -R mysql. /data/mysql/mha

拉取主库binlog日志:

注意:拉取日志的起点,需要按照目前主库当前正在使用的binlog为起点。

## db01 查看主库 初始binlog 文件号
mysql> show binary logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000018 |       177 |
| mysql-bin.000019 |      1188 |
+------------------+-----------+
2 rows in set (0.00 sec)

mysql>
mysql> show master status\G
*************************** 1. row ***************************
             File: mysql-bin.000019
         Position: 1188
     Binlog_Do_DB: 
 Binlog_Ignore_DB: 
Executed_Gtid_Set: b4dfda8a-0f89-11eb-a610-525400c58226:1-5
1 row in set (0.00 sec)

mysql>


## db03 拉取binlog
[root@mysql-node03 binlog_server]# mysqlbinlog -R --host=10.0.50.61 --user=mha --password=mha --raw --stop-never mysql-bin.000019 &
[2] 18789
[root@mysql-node03 binlog_server]# mysqlbinlog: [Warning] Using a password on the command line interface can be insecure.

[root@mysql-node03 binlog_server]# 
[root@mysql-node03 binlog_server]# 
[root@mysql-node03 binlog_server]# 
[root@mysql-node03 binlog_server]# ls
mysql-bin.000019
[root@mysql-node03 binlog_server]#

重启MHA-manager。

重启后,MHA具备一旦SSH链接失败,就能通过读取binlog-server内的日志进行数据补偿

posted @ 2022-11-24 20:28  oldSimon  阅读(26)  评论(0)    收藏  举报