yuanxiaojiang
人的放纵是本能,自律才是修行

数据库高可用介绍

数据库中的高可用功能,主要是用于避免数据库服务或数据信息的损坏问题,其中数据损坏的类型有:

  • 数据物理损坏:磁盘、主机、程序实例、数据文件误删除
  • 数据逻辑损坏:drop update ...

数据库高可用技术的出现主要解决的是数据逻辑损坏问题,而主从架构技术主要解决的是数据物理损坏问题

数据库高可用解决方案选型依据

无故障率故障时间解决方案
99.9% 0.1%(525.6min) keepalived+双主架构,但需要人为干预
99.99% 0.01%(52.56min) MHA ORCH TMHA,具有自动监控,自动切换,自动数据补偿,但还是属于半自动化
比较适合非金融类互联网公司 eg: facebook taobao前端-TMHA-->polaradb
99.999% 0.001%(5.256min) PXC MGR MGC,数据是高一致性
比较适合金融类互联网公司
99.9999% 0.0001%(0.5256min) 自动化、云计算化、平台化,仍然属于概念阶段
无故障率故障时间解决方案
99.9% 0.1%(525.6min) keepalived+双主架构,但需要人为干预
99.99% 0.01%(52.56min) MHA ORCH TMHA,具有自动监控,自动切换,自动数据补偿,但还是属于半自动化
比较适合非金融类互联网公司 eg: facebook taobao前端-TMHA-->polaradb
99.999% 0.001%(5.256min) PXC MGR MGC,数据是高一致性
比较适合金融类互联网公司
99.9999% 0.0001%(0.5256min) 自动化、云计算化、平台化,仍然属于概念阶段

MHA介绍

MHA是一套优秀的作为MySQL高可用性环境下故障切换和主从提升的高可用软件
MySQL进行故障切换过程中,MHA能做到在0~30秒之内自动完成数据库的故障切换操作,并且在进行故障切换过程中;
MHA能在最大程度上保证数据的一致性,以达到真正意义上的高可用

MHA主要有两部分组成:
MHA Manager(管理节点)
可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave上,不要部署在主节点上
MHA Node(数据节点)
MHA Manager 会定时探测集群中的master节点,当master出现故障时,它可以自动将最新数据的slave提升为新的master;
然后将所有其他的slave重新指向新的master,整个故障转移过程对应用程序是完全透明的;

 

MHA软件结构介绍:(MHA中的所有组件就是perl语言编写的功能脚本)

节点信息软件组件作用介绍
MHA Manager(管理节点) masterha_manger 用于启动MHA监控进程
  masterha_check_ssh 检查MHA集群中各节点间的SSH免密登录配置是否正常(用于检查MHA的SSH配置互信状况)
  masterha_check_repl 用于检查MySQL主从复制状态,以及配置信息
  masterha_master_monitor 用于检测master是否宕机
  masterha_check_status 用于检测当前MHA运行状态
  masterha_master_switch 用于主库切换(自动故障转移或者手动在线切换)
  masterha_conf_host 动态的添加或删除配置文件中的server信息
MHA Node(数据节点) save_binary_logs 保存和复制master的二进制日志
  apply_diff_relay_logs 识别差异的中继日志事件并将其差异的事件应用于其他slave
  purge_relay_logs 清除中继日志(不会阻塞SQL线程)

  数据库服务高可用环境构建

 MHA高可用架构基础环境(三台数据库+GTID复制)

主机角色主机名称地址信息
主库服务器 192.168.30.101 3306
从库服务器 192.168.30.102 3306
从库服务器 192.168.30.103(兼做管理节点) 3306
  • 清除原有环境的数据和日志(所有节点)
pkill mysqld
rm -rf /data/3306/data/*
rm -rf /data/3306/binlog/*
  • 配置文件(所有节点)
cat /data/3306/my.cnf
[mysql]
prompt=db01 [\\d]>
socket=/tmp/mysql.sock
[mysqld]
user=mysql
basedir=/usr/local/mysql
datadir=/data/3306/data
socket=/tmp/mysql.sock
server_id=51
port=3306
default_authentication_plugin=mysql_native_password
log_bin=/data/3306/binlog/mysql-bin
log_bin_index=/data/3306/binlog/mysql-bin.index
binlog_format=row
gtid-mode=on
enforce-gtid-consistency=true
log-slave-updates=1
autocommit=0
/data/3306/my.cnf
  • 初始化、启动(所有节点)
mysqld --initialize-insecure --user=mysql --basedir=/usr/local/mysql \
--datadir=/data/3306/data
/usr/local/mysql/bin/mysqld_safe --defaults-file=/data/3306/my.cnf &
  • 重建主从关系
# 主库操作
db01 [(none)]>create user repl@'10.0.0.%' identified with mysql_native_password by '123456';
db01 [(none)]>grant replication slave on *.* to repl@'10.0.0.%';

# 从库操作(db01、db02)
stop slave;
reset slave all;
change master to
master_host='10.0.0.51',
master_user='repl',
master_password='123456',
master_auto_position=1;
start slave;

 MHA高可用软件安装部署

  • 创建程序命令软连接(所有节点均配置)
ln -s /usr/local/mysql/bin/mysqlbinlog /usr/bin/mysqlbinlog
ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
    -- 由于HMA加载数据库命令,默认会在/usr/bin/目录下面进行加载
  • 配置各个节点互信(所有节点均配置)
ssh-keygen -t rsa -f /root/.ssh/id_rsa -P ''  # 创建密钥
ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.0.x  # 分发公钥
ssh 10.0.0.x date  # 测试
  • 安装mha软件包及其依赖
# 所有节点安装mha-Node软件及其依赖
  yum install perl-DBD-MySQL -y  
    # 让Perl脚本能够连接和操作mysql数据库的驱动(没有它Perl程序将无法访问MySQL)
  yum install -y mha4mysql-node-0.58-0.el7.centos.noarch.rpm

# 管理安装mha-manager需要安装的包(db03)
  [root@db03 ~]# yum install -y perl-Config-Tiny epel-release perl-Log-Dispatch \
perl-Parallel-ForkManager perl-Time-HiRes
  [root@db03 ~]# yum install -y mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
  • 在db01主库中创建mha监控用户
db01 [(none)]>create user mha@'10.0.0.%' identified with mysql_native_password by 'mha';
db01 [(none)]>grant all privileges on *.* to mha@'10.0.0.%';
  • manager配置文件(db03)
[server default]
manager_log=/var/log/mha/app01/manager
manager_workdir=/var/log/app01
master_binlog_dir=/data/3306/binlog
user=mha
password=mha
ping_interval=2
repl_user=repl
repl_password=123456
ssh_user=root

[server1]
hostname=10.0.0.51
port=3306

[server2]
hostname=10.0.0.52
port=3306

[server3]
hostname=10.0.0.53
port=3306
/etc/mha/app01.cnf
[root@db03 ~]# mkdir -p /etc/mha  # 创建配置文件目录
[root@db03 ~]# mkdir -p /var/log/mha/app01  # 创建日志目录(这个配置文件需要有写的权限)
[root@db03 ~]# vim /etc/mha/app01.cnf
[server default]    # 全局默认配置块:适用于所有服务器节点
manager_log=/var/log/mha/app01/manager  # MHA Manager进程的日志文件路径
manager_workdir=/var/log/app01  # MHA Manager的工作目录
master_binlog_dir=/data/3306/binlog  # 主服务器上binlog日志的实际存放目录路径(当主库宕机但SSH还能连接时,MHA 从这个目录抢救binlog)
user=mha  # 监控用户
password=mha  # 监控用户密码
ping_interval=2  # 心跳间隔秒数(MHA Manager会每隔2秒钟连接一次主库,判断主库是否存活)
repl_user=repl  # 配置主从复制的用户
repl_password=123456  # 主从复制用户密码
ssh_user=root  # ssh连接用户(ssh互信用户:通过互信用户从主库scp获取binlog日志信息,便于从库进行数据信息补偿)

[server1]    # 各个MySQL服务器的配置块
hostname=10.0.0.51  # MySQL服务器的IP地址或主机名
port=3306  # MySQL服务的端口号

[server2]
hostname=10.0.0.52
port=3306

[server3]
hostname=10.0.0.53
port=3306
  • ssh互信功能检查
#MHA管理节点,进行ssh互信功能检查(显示成功表示检查通过)
[root@db03 ~]# masterha_check_ssh --conf=/etc/mha/app01.cnf
All SSH connection tests passed successfully.
  • MHA配置的MySQL主从复制集群是否正常
#MHA管理节点,全面检查MHA配置下的MySQL主从复制集群是否正常
[root@db03 ~]# masterha_check_repl --conf=/etc/mha/app01.cnf
MySQL Replication Health is OK.
  • 启动MHA监控(MHA管理节点db03)

MHA在完成主从切换后会自动停止监控进程,需要手动重启以继续监控新的主从拓扑(--ignore_last_failover)

nohup masterha_manager --conf=/etc/mha/app01.cnf --remove_dead_master_conf \
--ignore_last_failover  < /dev/null> /var/log/mha/app01/manager.log 2>&1 &
  nohup:让进程忽略挂断信号(即使终端关闭,进程也不会退出)
  masterha_manager:用于启动MHA监控进程
  --conf=/etc/mha/app01.cnf:指定MHA配置文件路径
  --remove_dead_master_conf:故障转移成功后,自动从配置文件中移除宕机的主库配置
  --ignore_last_failover:忽略上一次故障转移的记录,避免因检测到旧纪录而拒绝启动
  </dev/null:将进程的标准输入重定向到空设备,避免像输入密码一样等待输入
  >/var/log/ha/app01/manager.log:将标准输出重定向到日志文件
  2>&1:将标准错误重定向到标准输出,即错误信息也写入同一个日志文件
  • 查看MHA状态
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf
app01 (pid:5551) is running(0:PING_OK), master:10.0.0.51
  # 显示以上提示信息,表示MHA基础环境搭建成功了,但还不能在生产环境使用,还需要有后续的操作配置

 主库崩溃测试及恢复

# 模拟主库出现异常
[root@db01 ~]# kill mysqld

# 监控管理节点进行查看MHA状态
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf
app01 is stopped(2:NOT_RUNNING).

# db02节点
show master status;  # 显示对应的信息
show slave status\G  # 显示为空

# db03节点
show slave status\G
Master_Host: 10.0.0.52
Slave_IO_Running: Yes
Slave_SQL_Running: Yes

现象:当在db02中插入相关数据时,db03就是不能同步数据
原因:MHA在完成主从切换后会自动停止监控进程,需要手动重启以继续监控新的主从拓扑(--ignore_last_failover)

# 将db01进行恢复,并加入到当前主从结构(需要重启监控进程)
vim /etc/mha/app01.cnf
[server1]
hostname=10.0.0.51
port=3306

stop slave;
reset slave all;
change master to
master_host='10.0.0.52',
master_user='repl',
master_password='123456',
master_auto_position=1;
start slave;
nohup masterha_manager
--conf=/etc/mha/app01.cnf --remove_dead_master_conf \ --ignore_last_failover < /dev/null> /var/log/mha/app01/manager.log 2>&1 &

  数据库服务高可用工作原理

1672308805649

应用高可用服务可以解决哪些痛点:
① 如何在高可用架构中,当主库宕机异常后,使之及时的发现主库服务程序产生了运行异常?
   解决此痛点问题,需要实现高可用的监控需求;
② 如何在高可用架构中,当主库宕机异常后,可以找到可以替代主库的服务器主机进行切换?
   解决此痛点问题,需要实现高可用的选主功能;(并且选择数据量越接近主库的从库成为新主)
③ 如何在高可用架构中,当主库宕机异常后,新的主库接管后可以保证与原有主库数据一致?
   解决此痛点问题,需要实现高可用的数据补偿;
④ 如何在高可用架构中,当主库宕机异常后,将应用程序的读写请求对接切换到新的主库上?
   解决此痛点问题,需要实现高可用的应用透明;(VIP技术)
⑤ 如何在高可用架构中,当主库宕机异常后,能够及时向管理员发起告知提醒使之进行修复?(MHA切换是一次性的)
   解决此痛点问题,需要实现高可用的报警功能;
⑥ 如何在高可用架构中,当主库宕机异常后,当整体主库系统环境都异常时实现数据的补偿?
   解决此痛点问题,需要实现高可用的额外补偿;
⑦ 如何在高可用架构中,当主库宕机异常后,根据主库服务器的异常情况进行原有主库修复?
   解决此痛点问题,需要实现高可用的自愈功能;(待开发,只有云平台RDS具有此功能)

 MHA软件启动

nohup masterha_manager --conf=/etc/mha/app01.cnf --remove_dead_master_conf \
--ignore_last_failover  < /dev/null> /var/log/mha/app01/manager.log 2>&1 &

根据以上启动命令,需要先调取MHA启动脚本文件masterha_manager ,然后在调取加载MHA软件的配置文件
--conf\
=.../
app01.cnf,会根据加载的MHA的配置文件不同,实现管理多个高可用架构环境,进行高可用业务的架构环境的区分;
--remove_dead_master_conf参数表示在主节点出现宕机情况时,将会从集群中被踢出,即从配置文件中删除掉故障节点; --ignore_last_failover 忽略上一次故障转移的记录,避免因检测到旧纪录而拒绝启动 (MHA设计了一个安全机制:成功完成一次故障转移后,他会在工作目录中创建一个状态文件。\
当MHA Manager再次启动时,会检查到这个状态文件存在,系统会认为刚刚完成了一次故障转移,\
现在立即启动监控不安全,会拒绝启动并报错) 最后将MHA启动运行的信息放入到日志文件中即可
/var/log/mha/app01/manager.log 2>&1

 MHA实现监控

MHA启动脚本文件masterha_manager会自动调用监控脚本文件masterha_master_monitor,并且每隔配置文件指定的时间(ping_interval=2)进行脚本监控一次,从而判断主节点是否处于存活状态(如果短时间连续多次检查失败,则确认主库不可用,启动故障转移流程)

[root@db03 ~]# mysql -umha -pmha -h10.0.0.53 -e "select user();"
mysql: [Warning] Using a password on the command line interface can be insecure.
+----------+
| user()   |
+----------+
| mha@db03 |
+----------+
主节点存活的定义:MHA Manager能够用监控账户成功连接到主库并执行简单查询

 MHA选主过程

在进行选主时,主要会关注竞选新主节点的日志量、以及是否设置candidate_master参数配置信息;

数组信息简述作用说明
alive 存活数组 主要用于探测存活的节点状态;当主库宕机后,探测的就是两个从库节点
latest 最新数组 表示获取日志最新的从库信息,即数据量最接近主库的从库(根据GTID信息 或 position信息)
pref 备选数组 在数组中具有candidate_master参数判断条件,此参数可以放入配置文件节点中,便于节点优先选择为新主
bad 不选数组 如果设定了参数:no_master=1,表示相应节点不参与竞选主;
如果设定了参数:log_bin=0(二进制日志没开),表示相应节点不参与竞选主;
如何设定了参数:check_slave_delay,检查从库延迟主库100M数据信息日志量,表示节点不参与竞选主

选主策略简述表:

优先级alive数组latest数组pref数组bad数组选主策略多个选择情况
01 满足 满足 满足 不满足 优选选择 按照节点号码顺序选择
02 满足 满足 不满足 不满足 优选选择 按照节点号码顺序选择
03 满足 不满足 满足 不满足 优选选择 按照节点号码顺序选择
04 满足 不满足 不满足 不满足 优选活着节点 按照节点号码顺序选择

说明:在进行手工指定切换新主时,即应用了prio_new_master_host参数信息时,会最优先选择相应节点为新主;

 MHA数据补偿

在进行数据补偿之前,需要让新主库与原有宕机主库进行对比,获悉需要补偿的数据量情况,即对比差的数据日志量信息;

然后可以从binlog日志中,进行补充数据信息的截取,随之进行数据信息补偿,但是有种特殊情况,若原有主库无法访问了;

# 环境:原主库SSH可连接(理想情况)
MHA在故障转移时自动执行:
save_binary_logs --command=save --start_file=mysql-bin.000123 --start_pos=456 --ssh_user=root
操作:直接从原主库的 master_binlog_dir 抢救最新的 binlog
优势:数据零丢失,最完整的补偿

# 环境:原主库SSH不可连接(服务器宕机)
比较各个从库的中继日志差异
apply_diff_relay_logs --server=新主库 --other_servers=其他从库
操作:从数据最新的从库的中继日志中生成差异补丁
风险:可能丢失原主库上已提交但未传输到任何从库的数据

# 预先配置 Binlog Server(最优方案)
提前部署了 Binlog Server 实时同步主库binlog
vim /etc/mha/app01.cnf
[binlog1]  # 节点标识:定义这是一个Binlog Server节点(必须以binlog开头,后面跟数字)
no_master=1  # 禁止选为主库
hostname=10.0.0.53  # 一台独立的服务器实时同步主库的binlog,即使原主库完全损坏无法SSH,也能从这台服务器获取binlog 
master_binlog_dir=/data/binlog_backup  # 实时同步的二进制存放位置(该目录需要有写的权限)
优势:即使原主库完全损坏,也能保证数据零丢失
原理:Binlog Server 实时拉取主库binlog作为冗余备份

 MHA业务切换(主从关系切换)

自动解除原有的主从关系,实现新的主从关系的建立;

# 所有从库自动解除主从关系操作
stop slave;
reset slave;
​
# 所有从库自动重构主从关系操作
change master to ...

 MHA应用透明(利用vip功能)

利用vip功能,实现对前端的访问

进行MHA的VIP地址漂移时,只能在局域网环境进行漂移,不能实现跨网段的VIP地址漂移;

#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;

my (
  $command,        $ssh_user,         $orig_master_host,
  $orig_master_ip, $orig_master_port, $new_master_host,
  $new_master_ip,  $new_master_port,  $new_master_user,
  $new_master_password
);
my $vip = '10.0.0.50/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

GetOptions(
  'command=s'             => \$command,
  'ssh_user=s'            => \$ssh_user,
  'orig_master_host=s'    => \$orig_master_host,
  'orig_master_ip=s'      => \$orig_master_ip,
  'orig_master_port=i'    => \$orig_master_port,
  'new_master_host=s'     => \$new_master_host,
  'new_master_ip=s'       => \$new_master_ip,
  'new_master_port=i'     => \$new_master_port,
  'new_master_user=s'     => \$new_master_user,
  'new_master_password=s' => \$new_master_password,
);

exit &main();

sub main {
  if ( $command eq "stop" || $command eq "stopssh" ) {
    my $exit_code = 1;
    eval {
      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
    elsif ( $command eq "start" ) {
        my $exit_code = 10;
        eval {
            print "Enabling the VIP - $vip on the new master - $new_master_host \n";
            &start_vip();
            &stop_vip();
            $exit_code = 0;
        };
        if ($@) {
            warn $@;
            exit $exit_code;
        }
        exit $exit_code;
    }
    elsif ( $command eq "status" ) {
        print "Checking the Status of the script.. OK \n";
        `ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
        exit 0;
    }
    else {
        &usage();
        exit 1;
    }
}

sub start_vip() {
    `ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master 
sub stop_vip() {
   `ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}

sub usage {
  print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
master_ip_failover
# 上传MHA所需的脚本文件
[root@db03 ~]# cd /usr/local/bin/
[root@db03 bin]# chmod +x /usr/local/bin/master_ip_failover
# 修改MHA脚本文件的信息
[root@db03 bin]# vim master_ip_failover
my $vip = '10.0.0.50/24';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig eth0:$key $vip";
my $ssh_stop_vip = "/sbin/ifconfig eth0:$key down";

# 修改配置文件
[root@db03 ~]# vim /etc/mha/app01.cnf
master_ip_failover_script=/usr/local/bin/master_ip_failover# 手工在主库上添加VIP
确保所有的节点上都有ifconfig命令(yum update -y , yum install net-tools -y)
ifconfig eth0:1 10.0.0.50/24  # 在主库节点手工添加vip地址信息(通过ip address查看)
# 重启MHA服务 [root@db03 ~]# masterha_stop --conf=/etc/mha/app01.cnf [root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app01.cnf --remove_dead_master_conf \ --ignore_last_failover < /dev/null> /var/log/mha/app01/manager.log 2>&1 & [root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf
# 进行VIP地址连接测试 -- 可以使用navcat软件,连接MHA的vip地址,查看所连主机信息是否为主节点,当故障转移后可以核实VIP地址是否持续连接;

 MHA故障报警

邮件发送脚本/usr/local/bin/mha_notification

chmod +x /usr/local/bin/mha_notification

#!/usr/bin/env perl
use strict;
use warnings;

my $smtp_server   = 'smtp.qq.com';
my $smtp_user     = '2573013863@qq.com';
my $smtp_password = 'qfpoqradxtvlecgf';
my $to_email      = '2573013863@qq.com';

my ($event, $msg) = @ARGV;
my $time = scalar(localtime);

my %events = (
    'master_failover'      => '[MHA告警] 主库切换完成',
    'master_failover_start'=> '[MHA紧急] 开始切换',
    'master_online'        => '[MHA信息] 在线切换',
    'error'                => '[MHA错误] 监控异常',
);

my $subject = $events{$event} || '[MHA通知] 未知事件';
my $body = "事件: $event\n时间: $time\n详情: $msg";

eval {
    require Mail::Sender;
    my $sender = Mail::Sender->new({
        smtp => $smtp_server, port => 587, auth => 'LOGIN',
        authid => $smtp_user, authpwd => $smtp_password,
        from => $smtp_user, tls => 1
    });
    $sender->MailMsg({to => $to_email, subject => $subject, msg => $body});
    print "邮件发送成功: $subject\n";
};

if ($@) { print "发送失败: $@\n" }

open my $log, '>>', '/var/log/mha/notification.log';
print $log "$time - $event - $msg\n";
close $log;
mha_notification
# 上传MHA所需的脚本文件
[root@db03 ~]# cd /usr/local/bin/
[root@db03 ~]# chmod +x /usr/local/bin/mha_notification
[root@db03 ~]# vim /usr/local/bin/mha_notification
my $smtp_server   = 'smtp.qq.com';      # QQ邮箱SMTP服务器
my $smtp_port     = 587;                # QQ邮箱端口
my $from_email    = '2573013863@qq.com'; # 发件邮箱
my $smtp_user     = '2573013863@qq.com'; # 邮箱账号
my $smtp_password = 'qfpoqradxtvlecgf';   # 邮箱授权码(不是密码!)
$to_email ||= '2573013863@qq.com';  # 默认收件人(如果命令行没指定)

# 修改配置文件
[root@db03 ~]# vim /etc/mha/app01.cnf
report_script=/usr/local/bin/mha_notification
# 重启MHA服务 [root@db03 ~]# masterha_stop --conf=/etc/mha/app01.cnf [root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app01.cnf --remove_dead_master_conf \ --ignore_last_failover < /dev/null> /var/log/mha/app01/manager.log 2>&1 & [root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf

 MHA额外补偿

利用binlog_server作为额外的日志补偿的冗余方案,即实时保存主库的bin_log日志文件到特定节点目录中;

# 创建日志存放目录
[root@db03 ~]# mkdir -p /data/binlog_server/
[root@db03 ~]# chown -R mysql.mysql /data/binlog_server/
[root@db03 ~]# mysql -e "show slave status\G"|grep 'Master_Log'
Master_Log_File: mysql-bin.000001
Read_Master_Log_Pos: 1564
Relay_Master_Log_File: mysql-bin.000001
Exec_Master_Log_Pos: 1564

# 拉取日志的并指定主库当前的binlog日志位置作为起点拉去
[root@db03 data]# cd /data/binlog_server/
[root@db03 binlog_server]# mysqlbinlog -R --host=10.0.0.51 --user=mha \
--password=mha --raw --stop-never mysql-bin.000001 &
[root@db03 ~]# ps -ef |grep mysqlbinlog# 编写配置文件信息 [root@db03 ~]# vim /etc/mha/app01.cnf [binlog1] no_master=1 hostname=10.0.0.53 master_binlog_dir=/data/binlog_server/# 重启MHA服务 [root@db03 ~]# masterha_stop --conf=/etc/mha/app01.cnf [root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app01.cnf --remove_dead_master_conf \ --ignore_last_failover < /dev/null> /var/log/mha/app01/manager.log 2>&1 & [root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf

image

ps -ef |grep mysqlbinlog

 数据库服务高可用故障切换日志分析(实现MHA高可用切换六个步骤)

[root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf
app01 (pid:8296) is running(0:PING_OK), master:10.0.0.52
[root@db02 ~]# pkill mysqld
[root@db03 ~]# tail -f /var/log/mha/app01/manager
  • 01 MHA健康检查报错,显示主数据库节点无法正常连接
[warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.52' (111))
[warning] Connection failed 2 time(s)..
[warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.52' (111))
[warning] Connection failed 3 time(s)..
[warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.52' (111))
[warning] Connection failed 4 time(s)..
[warning] Master is not reachable from health checker!
[warning] Master 10.0.0.52(10.0.0.52:3306) is not reachable!
# 对故障主节点进行4次健康检查,主节点数据库服务仍旧无法连接,即判定主节点故障
  • 02 MHA检测到主库宕机,正根据GTID复制进度自动选举数据最新的从库为新主
[info] Connecting to a master server failed. Reading configuration \
file /etc/masterha_default.cnf and /etc/mha/app01.cnf again, and trying to connect \
to all servers to check server status..
...
[info] * Phase 1: Configuration Check Phase..
...
[info] ** Phase 1: Configuration Check Phase completed.

 

  • 03 MHA进行节点关闭,选择完新的主节点后会将原有主节点的VIP地址消除
[info] * Phase 2: Dead Master Shutdown Phase..
[info] Forcing shutdown so that applications never connect to the current master..
[info] Executing master IP deactivation script:
[info]   /usr/local/bin/master_ip_failover --orig_master_host=10.0.0.52 --orig_master_ip=10.0.0.52 --orig_master_port=3306 --command=stopssh --ssh_user=root  
[warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
[info] * Phase 2: Dead Master Shutdown Phase completed.
  • 04 MHA进行节点切换,在新的主节点上进行非同步数据信息的补偿
[info] * Phase 3: Master Recovery Phase..
[info] * Phase 3.1: Getting Latest Slaves Phase..
....
[info] * Phase 3.3: Determining New Master Phase..
[info] New master is 10.0.0.51(10.0.0.51:3306)
[info] Starting master failover..
[info] * Phase 3.3: New Master Recovery Phase..
[info] Executing binlog save command: save_binary_logs --command=save --start_file=mysql-bin.000002  --start_pos=1201 --output_file=/var/tmp/saved_binlog_binlog1_20230102153233.binlog --handle_raw_binlog=0 --skip_filter=1 --disable_log_bin=0 --manager_version=0.58 --oldest_version=8.0.26  --binlog_dir=/data/binlog_server/
[info] Additional events were not found from the binlog server. No need to save.
Enabling the VIP - 10.0.0.50/24 on the new master - 10.0.0.51[info]  OK.
[info] ** Finished master recovery successfully.
[info] * Phase 3: Master Recovery Phase completed.

 

  • 05 MHA进行主从重构,将从库连接到新的主库上
[info] * Phase 4: Slaves Recovery Phase..
[info] * Phase 4.1: Starting Slaves in parallel..
[info] -- Slave recovery on host 10.0.0.53(10.0.0.53:3306) started, pid: 8659. Check tmp log /var/log/app01/10.0.0.53_3306_20250926182323.log if it takes time..
[info] Log messages from 10.0.0.53 ...
[info]  Resetting slave 10.0.0.53(10.0.0.53:3306) and starting replication from the new master 10.0.0.51(10.0.0.51:3306)..
[info]  Executed CHANGE MASTER.
[info]  Slave started. [info]  gtid_wait(a9977de8-9a01-11f0-8f55-000c29e3dc70:1-5,
adfada3d-9a01-11f0-9024-000c290a3b5f:1-5) completed on 10.0.0.53(10.0.0.53:3306). Executed 0 events.
[info] End of log messages from 10.0.0.53.
[info] -- Slave on host 10.0.0.53(10.0.0.53:3306) started.
[info] All new slave servers recovered successfully.
  • 06 MHA切换完毕过程,架构故障转移切换完毕后做清理阶段,并进行最终汇报
[info] * Phase 5: New master cleanup phase..
----- Failover Report -----
app01: MySQL Master failover 10.0.0.52(10.0.0.52:3306) to 10.0.0.51(10.0.0.51:3306) succeeded
Master 10.0.0.52(10.0.0.52:3306) is down!
Check MHA Manager logs at db03:/var/log/mha/app01/manager for details.
[info] Sending mail..
邮件发送成功: [MHA通知] 未知事件

 

  数据库服务高可用维护操作

 实现MHA高可用主节点在线切换(手工操作)

可以在主库没有故障的情况下,利用手工方式将主库业务切换到其它的从库节点上,从而解放原有主库节点(维护性操作时应用);

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
  $command,          
  $orig_master_is_new_slave, $orig_master_host, $orig_master_ip,  $orig_master_port, $orig_master_user,    $orig_master_password, $orig_master_ssh_user,
  $new_master_host,          $new_master_ip,    $new_master_port, $new_master_user,  $new_master_password, $new_master_ssh_user,
);

# VIP配置 - 根据您的网络环境调整
my $vip = '10.0.0.50';
my $brdc = '10.0.0.255';
my $ifdev = 'eth0';
my $key = '1';
my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
my $ssh_check_vip = "/usr/sbin/ip addr show dev $ifdev | grep -w $vip";

GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

# 检查VIP是否已存在
sub check_vip_exists {
    my $host = shift;
    my $ssh_user = shift;
    my $result = `ssh $ssh_user\@$host \" $ssh_check_vip \"`;
    return $result ? 1 : 0;
}

# 启动VIP
sub start_vip {
    print current_time_us() . " Checking if VIP $vip already exists on $new_master_host...\n";
    if (check_vip_exists($new_master_host, $new_master_ssh_user)) {
        print current_time_us() . " VIP $vip already exists on $new_master_host, skipping.\n";
        return 1;
    }
    
    print current_time_us() . " Enabling VIP $vip on new master: $new_master_host \n";
    my $result = `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \" 2>&1`;
    if ($? == 0) {
        print current_time_us() . " VIP enabled successfully.\n";
        
        # 验证VIP是否真的启动了
        sleep(1);
        if (check_vip_exists($new_master_host, $new_master_ssh_user)) {
            print current_time_us() . " VIP verification: VIP $vip is now active on $new_master_host.\n";
            return 1;
        } else {
            print current_time_us() . " WARNING: VIP command succeeded but VIP is not visible on interface.\n";
            return 0;
        }
    } else {
        print current_time_us() . " ERROR: Failed to enable VIP: $result\n";
        return 0;
    }
}

# 停止VIP
sub stop_vip {
    print current_time_us() . " Checking if VIP $vip exists on $orig_master_host...\n";
    unless (check_vip_exists($orig_master_host, $orig_master_ssh_user)) {
        print current_time_us() . " VIP $vip not found on $orig_master_host, skipping.\n";
        return 1;
    }
    
    print current_time_us() . " Disabling VIP $vip on old master: $orig_master_host \n";
    my $result = `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \" 2>&1`;
    if ($? == 0) {
        print current_time_us() . " VIP disabled successfully.\n";
        
        # 验证VIP是否真的停止了
        sleep(1);
        unless (check_vip_exists($orig_master_host, $orig_master_ssh_user)) {
            print current_time_us() . " VIP verification: VIP $vip is now removed from $orig_master_host.\n";
            return 1;
        } else {
            print current_time_us() . " WARNING: VIP stop command succeeded but VIP is still visible on interface.\n";
            return 0;
        }
    } else {
        print current_time_us() . " ERROR: Failed to disable VIP: $result\n";
        return 0;
    }
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## 停止旧主库上的VIP
      &stop_vip();

      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->disconnect();

      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      $new_master_handler->disconnect();

      ## 在新主库上启动VIP
      if (&start_vip()) {
          $exit_code = 0;
      } else {
          $exit_code = 10; # VIP启动失败,但继续执行
      }
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {
    # 检查VIP状态
    print "VIP Status:\n";
    print "Old Master ($orig_master_host): " . (check_vip_exists($orig_master_host, $orig_master_ssh_user) ? "VIP EXISTS" : "VIP NOT FOUND") . "\n";
    print "New Master ($new_master_host): " . (check_vip_exists($new_master_host, $new_master_ssh_user) ? "VIP EXISTS" : "VIP NOT FOUND") . "\n";
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n";
  die;
}
mha_vip_online_change_script

 

# 关闭MHA服务程序
masterha_stop --conf=/etc/mha/app01.cnf
-- 关闭mha程序是保证手工切换时,不会受到mha自动切换的影响

# 执行MHA手工切换
masterha_master_switch --conf=/etc/mha/app01.cnf --master_state=alive \
--new_master_host=10.0.0.52 --orig_master_is_new_slave --running_updates_limit=10000 masterha_master_switch:MHA Manager 的主备切换工具 --master_state=alive:表名当前主库处于正常运行状态
--new_master_host=10.0.0.52:手动指定新的主库服务器地址
--orig_master_is_new_slave:在切换完成后,将原主库配置为新主库的从库
--running_updates_limit=10000:设置一个安全阈值,用于主从控制切换(主库上正在执行的并发线程数,超过该值会暂停或中止切换操作)

FLUSH NO_WRITE_TO_BINLOG TABLES
在不记录二进制日志的情况下刷新所有表,确保刷新操作不会复制到从库
刷新所有表的作用:为了确保所有表的缓存数据写入磁盘并关闭所有打开的表,保证数据一致性和操作安全性


It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. \
Is it ok to execute on 10.0.0.51(10.0.0.51:3306)? (YES/no):
-- 询问是否同意在当前主库(10.0.0.51)上执行表刷新操作
Starting master switch from 10.0.0.51(10.0.0.51:3306) to 10.0.0.52(10.0.0.52:3306)? (yes/NO): 
-- 询问是否开始从10.0.0.51切换到10.0.0.52的主从切换操作
master_ip_online_change_script is not defined. If you do not disable writes on the current \
master manually, applications keep writing on the current master. Is it ok to proceed? (yes/NO): 
-- 问题:MHA发现 master_ip_online_change_script 参数没有正确定义或脚本文件不存在
-- 风险:如果没有VIP切换脚本,MHA无法自动转移虚拟IP地址
-- 后果:应用程序可能继续向旧的主库写入数据,导致数据不一致

# 出现的问题
mha进程关闭
vip的地址没有实现飘逸(需要重新启动mha进程,这会导致新从库的日志内容比旧从库的日志内容多)

# 重启MHA服务
[root@db03 ~]# masterha_stop --conf=/etc/mha/app01.cnf
[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app01.cnf --remove_dead_master_conf \
--ignore_last_failover  < /dev/null> /var/log/mha/app01/manager.log 2>&1 &
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf

 

 

 应用master_ip_online_change_script功能脚本

MHA在线切换脚本master_ip_online_change结合VIP-阿里云开发者社区

#!/usr/bin/env perl

use strict;
use warnings FATAL => 'all';

use Getopt::Long;
use MHA::DBHelper;
use MHA::NodeUtil;
use Time::HiRes qw( sleep gettimeofday tv_interval );
use Data::Dumper;

my $_tstart;
my $_running_interval = 0.1;
my (
  $command,          
  $orig_master_is_new_slave, $orig_master_host, $orig_master_ip,  $orig_master_port, $orig_master_user,    $orig_master_password, $orig_master_ssh_user,
  $new_master_host,          $new_master_ip,    $new_master_port, $new_master_user,  $new_master_password, $new_master_ssh_user,
);

# VIP配置 - 根据您的网络环境调整
my $vip = '10.0.0.50';
my $brdc = '10.0.0.255';
my $ifdev = 'eth0';
my $key = '1';
my $ssh_start_vip = "/usr/sbin/ip addr add $vip/24 brd $brdc dev $ifdev label $ifdev:$key;/usr/sbin/arping -q -A -c 1 -I $ifdev $vip;iptables -F;";
my $ssh_stop_vip = "/usr/sbin/ip addr del $vip/24 dev $ifdev label $ifdev:$key";
my $ssh_check_vip = "/usr/sbin/ip addr show dev $ifdev | grep -w $vip";

GetOptions(
  'command=s'                => \$command,
  'orig_master_is_new_slave' => \$orig_master_is_new_slave,
  'orig_master_host=s'       => \$orig_master_host,
  'orig_master_ip=s'         => \$orig_master_ip,
  'orig_master_port=i'       => \$orig_master_port,
  'orig_master_user=s'       => \$orig_master_user,
  'orig_master_password=s'   => \$orig_master_password,
  'orig_master_ssh_user=s'   => \$orig_master_ssh_user,
  'new_master_host=s'        => \$new_master_host,
  'new_master_ip=s'          => \$new_master_ip,
  'new_master_port=i'        => \$new_master_port,
  'new_master_user=s'        => \$new_master_user,
  'new_master_password=s'    => \$new_master_password,
  'new_master_ssh_user=s'    => \$new_master_ssh_user,
);

exit &main();

sub current_time_us {
  my ( $sec, $microsec ) = gettimeofday();
  my $curdate = localtime($sec);
  return $curdate . " " . sprintf( "%06d", $microsec );
}

sub sleep_until {
  my $elapsed = tv_interval($_tstart);
  if ( $_running_interval > $elapsed ) {
    sleep( $_running_interval - $elapsed );
  }
}

sub get_threads_util {
  my $dbh                    = shift;
  my $my_connection_id       = shift;
  my $running_time_threshold = shift;
  my $type                   = shift;
  $running_time_threshold = 0 unless ($running_time_threshold);
  $type                   = 0 unless ($type);
  my @threads;

  my $sth = $dbh->prepare("SHOW PROCESSLIST");
  $sth->execute();

  while ( my $ref = $sth->fetchrow_hashref() ) {
    my $id         = $ref->{Id};
    my $user       = $ref->{User};
    my $host       = $ref->{Host};
    my $command    = $ref->{Command};
    my $state      = $ref->{State};
    my $query_time = $ref->{Time};
    my $info       = $ref->{Info};
    $info =~ s/^\s*(.*?)\s*$/$1/ if defined($info);
    next if ( $my_connection_id == $id );
    next if ( defined($query_time) && $query_time < $running_time_threshold );
    next if ( defined($command)    && $command eq "Binlog Dump" );
    next if ( defined($user)       && $user eq "system user" );
    next
      if ( defined($command)
      && $command eq "Sleep"
      && defined($query_time)
      && $query_time >= 1 );

    if ( $type >= 1 ) {
      next if ( defined($command) && $command eq "Sleep" );
      next if ( defined($command) && $command eq "Connect" );
    }

    if ( $type >= 2 ) {
      next if ( defined($info) && $info =~ m/^select/i );
      next if ( defined($info) && $info =~ m/^show/i );
    }

    push @threads, $ref;
  }
  return @threads;
}

# 检查VIP是否已存在
sub check_vip_exists {
    my $host = shift;
    my $ssh_user = shift;
    my $result = `ssh $ssh_user\@$host \" $ssh_check_vip \"`;
    return $result ? 1 : 0;
}

# 启动VIP
sub start_vip {
    print current_time_us() . " Checking if VIP $vip already exists on $new_master_host...\n";
    if (check_vip_exists($new_master_host, $new_master_ssh_user)) {
        print current_time_us() . " VIP $vip already exists on $new_master_host, skipping.\n";
        return 1;
    }
    
    print current_time_us() . " Enabling VIP $vip on new master: $new_master_host \n";
    my $result = `ssh $new_master_ssh_user\@$new_master_host \" $ssh_start_vip \" 2>&1`;
    if ($? == 0) {
        print current_time_us() . " VIP enabled successfully.\n";
        
        # 验证VIP是否真的启动了
        sleep(1);
        if (check_vip_exists($new_master_host, $new_master_ssh_user)) {
            print current_time_us() . " VIP verification: VIP $vip is now active on $new_master_host.\n";
            return 1;
        } else {
            print current_time_us() . " WARNING: VIP command succeeded but VIP is not visible on interface.\n";
            return 0;
        }
    } else {
        print current_time_us() . " ERROR: Failed to enable VIP: $result\n";
        return 0;
    }
}

# 停止VIP
sub stop_vip {
    print current_time_us() . " Checking if VIP $vip exists on $orig_master_host...\n";
    unless (check_vip_exists($orig_master_host, $orig_master_ssh_user)) {
        print current_time_us() . " VIP $vip not found on $orig_master_host, skipping.\n";
        return 1;
    }
    
    print current_time_us() . " Disabling VIP $vip on old master: $orig_master_host \n";
    my $result = `ssh $orig_master_ssh_user\@$orig_master_host \" $ssh_stop_vip \" 2>&1`;
    if ($? == 0) {
        print current_time_us() . " VIP disabled successfully.\n";
        
        # 验证VIP是否真的停止了
        sleep(1);
        unless (check_vip_exists($orig_master_host, $orig_master_ssh_user)) {
            print current_time_us() . " VIP verification: VIP $vip is now removed from $orig_master_host.\n";
            return 1;
        } else {
            print current_time_us() . " WARNING: VIP stop command succeeded but VIP is still visible on interface.\n";
            return 0;
        }
    } else {
        print current_time_us() . " ERROR: Failed to disable VIP: $result\n";
        return 0;
    }
}

sub main {
  if ( $command eq "stop" ) {
    ## Gracefully killing connections on the current master
    my $exit_code = 1;
    eval {
      ## Setting read_only=1 on the new master (to avoid accident)
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error(die_on_error)_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );
      print current_time_us() . " Set read_only on the new master.. ";
      $new_master_handler->enable_read_only();
      if ( $new_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }
      $new_master_handler->disconnect();

      # Connecting to the orig master, die if any database error happens
      my $orig_master_handler = new MHA::DBHelper();
      $orig_master_handler->connect( $orig_master_ip, $orig_master_port,
        $orig_master_user, $orig_master_password, 1 );

      ## Waiting for N * 100 milliseconds so that current connections can exit
      my $time_until_read_only = 15;
      $_tstart = [gettimeofday];
      my @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_read_only > 0 && $#threads >= 0 ) {
        if ( $time_until_read_only % 5 == 0 ) {
          printf
"%s Waiting all running %d threads are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_read_only * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_read_only--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## Setting read_only=1 on the current master so that nobody(except SUPER) can write
      print current_time_us() . " Set read_only=1 on the orig master.. ";
      $orig_master_handler->enable_read_only();
      if ( $orig_master_handler->is_read_only() ) {
        print "ok.\n";
      }
      else {
        die "Failed!\n";
      }

      ## Waiting for M * 100 milliseconds so that current update queries can complete
      my $time_until_kill_threads = 5;
      @threads = get_threads_util( $orig_master_handler->{dbh},
        $orig_master_handler->{connection_id} );
      while ( $time_until_kill_threads > 0 && $#threads >= 0 ) {
        if ( $time_until_kill_threads % 5 == 0 ) {
          printf
"%s Waiting all running %d queries are disconnected.. (max %d milliseconds)\n",
            current_time_us(), $#threads + 1, $time_until_kill_threads * 100;
          if ( $#threads < 5 ) {
            print Data::Dumper->new( [$_] )->Indent(0)->Terse(1)->Dump . "\n"
              foreach (@threads);
          }
        }
        sleep_until();
        $_tstart = [gettimeofday];
        $time_until_kill_threads--;
        @threads = get_threads_util( $orig_master_handler->{dbh},
          $orig_master_handler->{connection_id} );
      }

      ## 停止旧主库上的VIP
      &stop_vip();

      ## Terminating all threads
      print current_time_us() . " Killing all application threads..\n";
      $orig_master_handler->kill_threads(@threads) if ( $#threads >= 0 );
      print current_time_us() . " done.\n";
      $orig_master_handler->disconnect();

      $exit_code = 0;
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "start" ) {
    ## Activating master ip on the new master
    my $exit_code = 10;
    eval {
      my $new_master_handler = new MHA::DBHelper();

      # args: hostname, port, user, password, raise_error_or_not
      $new_master_handler->connect( $new_master_ip, $new_master_port,
        $new_master_user, $new_master_password, 1 );

      ## Set read_only=0 on the new master
      print current_time_us() . " Set read_only=0 on the new master.\n";
      $new_master_handler->disable_read_only();

      $new_master_handler->disconnect();

      ## 在新主库上启动VIP
      if (&start_vip()) {
          $exit_code = 0;
      } else {
          $exit_code = 10; # VIP启动失败,但继续执行
      }
    };
    if ($@) {
      warn "Got Error: $@\n";
      exit $exit_code;
    }
    exit $exit_code;
  }
  elsif ( $command eq "status" ) {
    # 检查VIP状态
    print "VIP Status:\n";
    print "Old Master ($orig_master_host): " . (check_vip_exists($orig_master_host, $orig_master_ssh_user) ? "VIP EXISTS" : "VIP NOT FOUND") . "\n";
    print "New Master ($new_master_host): " . (check_vip_exists($new_master_host, $new_master_ssh_user) ? "VIP EXISTS" : "VIP NOT FOUND") . "\n";
    exit 0;
  }
  else {
    &usage();
    exit 1;
  }
}

sub usage {
  print
"Usage: master_ip_online_change --command=start|stop|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --orig_master_user=user --orig_master_password=password --orig_master_ssh_user=sshuser --new_master_host=host --new_master_ip=ip --new_master_port=port --new_master_user=user --new_master_password=password --new_master_ssh_user=sshuser \n";
  die;
}
mha_vip_online_change_script

 

[root@db03 ~]# vim /usr/local/bin/mha_vip_online_change_script

[root@db03 ~]# chmod +x /usr/local/bin/mha_vip_online_change_script



[root@db03 ~]# vim /etc/mha/app01.cnf master_ip_online_change_script=/usr/local/bin/mha_vip_online_change_script

master_ip_failover_script:在故障切换时自动将VIP从故障主库转移到新主库的脚本。

master_ip_online_change_script:在计划内主库切换时自动将VIP从旧主库迁移到新主库的脚本。

# 重启MHA服务
[root@db03 ~]# masterha_stop --conf=/etc/mha/app01.cnf
[root@db03 ~]# nohup masterha_manager --conf=/etc/mha/app01.cnf --remove_dead_master_conf \
--ignore_last_failover  < /dev/null> /var/log/mha/app01/manager.log 2>&1 &
[root@db03 ~]# masterha_check_status --conf=/etc/mha/app01.cnf

# 之后从新之心对应的在线切换操作

 

 

 

 

 

错误记录

MHA高可用:masterha_check_repl error 汇总_mha repl不过-CSDN博客

[root@db03 bin]# masterha_check_repl --conf=/etc/mha/app01.cnf
Thu Sep 25 15:36:51 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln208] Slaves settings check failed!
Thu Sep 25 15:36:51 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln416] Slave configuration failed.
Thu Sep 25 15:36:51 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations.  at /usr/bin/masterha_check_repl line 48.
Thu Sep 25 15:36:51 2025 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.

数据库服务读写分离

数据库服务读写分析原理分析

读写分离架构最终目的:实现业务写的请求到达主库,实现业务读的请求到达从库,从而减少主库的压力,实现不同请求的压力分担;

可以利用读写分离中间件实现以上的功能需求:atlas(360公司出品) proxySQL

利用读写分离中间件的设置,当业务请求有select查询时,将请求发送给从库,当业务请求有update insert等修改时,将请求发送给主库

1673922233191

proxySQL是基于MySQL的一款开源的中间件的产品,是一个灵活的MySQL代理层,可以实现读写分离:

  • proxySQL数据库中间件支持Query路由功能;
  • pxoxySQL数据库中间件支持动态指定某个SQL进行缓存;
  • proxySQL数据库中间件支持动态加载配置信息(无需重启ProxySQL服务)
  • proxySQL数据库中间件支持故障切换和SQL的过滤功能(安全)

ProxySQL的参考网站连接:

https://www.proxysql.com/

https://github.com/sysown/proxysql/releases

数据库服务读写分离架构搭建

步骤一:读写分离架构部署环境规划

为了实现读写分离架构构建,需要准备好三节点数据库+GTID复制环境+MHA环境(普通主从环境也可以构建);

主机角色主机名称地址信息
主库服务器 10.0.0.51 3306
从库服务器 10.0.0.52 3306
从库服务器 10.0.0.53
(兼做管理节点)
3306

步骤二:读写分离架构软件下载安装

[root@db03 ~]# rpm -ivh proxysql-2.7.3-1-centos7.x86_64.rpm
-i表示安装软件包,-v表示显示详细安装信息,-h表示以哈希符号显示安装进度条

[root@db03 ~]# ll /etc/systemd/system/proxysql*  # 默认会自动安装到system目录下
-rw-r--r-- 1 root root 392 Mar 9 2025 /etc/systemd/system/proxysql-initial.service
-rw-r--r-- 1 root root 826 Mar 9 2025 /etc/systemd/system/proxysql.service

[root@db03 ~]# systemctl start proxysql

[root@db03 ~]# netstat -lntup |grep proxysql
tcp 0 0 0.0.0.0:6032 0.0.0.0:* LISTEN 10829/proxysql
tcp 0 0 0.0.0.0:6033 0.0.0.0:* LISTEN 10829/proxysql

# 启动生成的6032端口为管理端口,用于配置数据库中间件的功能信息连接此端口
# 启动生成的6033端口为访问端口,用于提供对外的业务访问此端口

 

步骤三:读写分离架构软件管理配置

在连接进入6032端口之后,表示进行proxysql的管理终端环境,终端环境中会加载五个重要的功能库:

库信息配置信息解释说明
main mysql_servers 表示后端可以连接mysql服务器的列表
  mysql_users 表示配置后端数据库的连接账号和监控账号
  mysql_query_rules 表示指定query路由到后端不同服务器的规则列表
  mysql_replication_hostgroups 表示节点分组配置信息,可以配置多个写或读节点到一个组中
disk   表示持久化的磁盘配置信息
stats   表示统计信息的汇总
monitor   表示监控收集的信息,比如数据库的监控状态等
stats_history   表示收集的有关软件内部功能的历史指标

说明:一般服务是通过配置文件保存功能配置信息,proxySQL是通过数据库中的表进行配置信息的存储设置;

 

[root@db03 ~]# mysql -uadmin -padmin -h127.0.0.1 -P6032
mysql> show databases;
+-----+---------------+-------------------------------------+
| seq | name          | file                                |
+-----+---------------+-------------------------------------+
| 0   | main          |                                     |
| 2   | disk          | /var/lib/proxysql/proxysql.db       |
| 3   | stats         |                                     |
| 4   | monitor       |                                     |
| 5   | stats_history | /var/lib/proxysql/proxysql_stats.db |
+-----+---------------+-------------------------------------+

mysql> show tables;
+----------------------------------------------------+
| tables                                             |
+----------------------------------------------------+
| coredump_filters                                   |
| global_variables                                   |
| mysql_aws_aurora_hostgroups                        |
| mysql_collations                                   |
| mysql_firewall_whitelist_rules                     |
| mysql_firewall_whitelist_sqli_fingerprints         |
| mysql_firewall_whitelist_users                     |
| mysql_galera_hostgroups                            |
| mysql_group_replication_hostgroups                 |
| mysql_hostgroup_attributes                         |
| mysql_query_rules                                  |
| mysql_query_rules_fast_routing                     |
| mysql_replication_hostgroups                       |
| mysql_servers                                      |
| mysql_servers_ssl_params                           |
| mysql_users                                        |
| proxysql_servers                                   |
| restapi_routes                                     |
| runtime_checksums_values                           |
| runtime_coredump_filters                           |
| runtime_global_variables                           |
| runtime_mysql_aws_aurora_hostgroups                |
| runtime_mysql_firewall_whitelist_rules             |
| runtime_mysql_firewall_whitelist_sqli_fingerprints |
| runtime_mysql_firewall_whitelist_users             |
| runtime_mysql_galera_hostgroups                    |
| runtime_mysql_group_replication_hostgroups         |
| runtime_mysql_hostgroup_attributes                 |
| runtime_mysql_query_rules                          |
| runtime_mysql_query_rules_fast_routing             |
| runtime_mysql_replication_hostgroups               |
| runtime_mysql_servers                              |
| runtime_mysql_servers_ssl_params                   |
| runtime_mysql_users                                |
| runtime_proxysql_servers                           |
| runtime_restapi_routes                             |
| runtime_scheduler                                  |
| scheduler                                          |
+----------------------------------------------------+
-- 表名以runtiem_开头的表示proxySQL服务中当前运行的配置内容,不能直接修改,不带runtime是下文图中mem相关的配置

 

  Proxy SQL管理接口的多层配置关系

ProxySQL三层配置系统

1673973553046

# 第一层:RUNTIME:
代表proxySQL当前正在使用的配置,无法直接修改此配置,必须要从下一层(MEM层)load加载进来;

# 第二层:MEMORY(主要修改的配置表)
memory层上面连接runtime层,下面连接disk持久化存储层;
可以在线操作ProxySQL配置,随意进行修改,不会影响生产环境,确认正常之后再加载到runtime和持久化保存到磁盘
具体修改操作方法为:insertupdatedeleteselect;

# 第三层:DISK/CFG FILE
持久化配置信息,重启时可以从磁盘快速加载回来;

 

ProxySQL不同层次间移动配置信息

user相关配置

LOAD MYSQL USERS TO RUNTIME;   -- MEM加载到runtime
SAVE MYSQL USERS TO MEMORY;  -- RUNTIME保存至MEM
LOAD MYSQL USERS FROM DISK;  -- DISK加载到MEM
SAVE MYSQL USERS TO DISK;  -- MEM保存至DISK
LOAD MYSQL USERS FROM CONFIG;  -- CFG加载到MEM

 

server服务相关配置

LOAD MYSQL SERVERS TO RUNTIME;  -- MEM加载到RUNTIME
SAVE MYSQL SERVERS TO MEMORY;  -- RUNTIME保存至MEM
LOAD MYSQL SERVERS FROM DISK;  -- DISK加载到MEM
SAVE MYSQL SERVERS TO DISK;  -- MEM保存至DISK
LOAD MYSQL SERVERS FROM CONFIG;  -- CFG加载到MEM

 

mysql query rules相关配置

LOAD MYSQL QUERY RULES TO RUNTIME;  -- MEM加载到RUNTIME
SAVE MYSQL QUERY RULES TO MEMORY;  -- RUNTIME保存至MEM
LOAD MYSQL QUERY RULES FROM DISK;  -- DISK加载到MEM
SAVE MYSQL QUERY RULES TO DISK;  -- MEM保存至DISK
LOAD MYSQL QUERY RULES FROM CONFIG;  -- CFG加载到MEM

 

mysql variables相关配置

LOAD MYSQL VARIABLES TO RUNTIME;  -- MEM加载到RUNTIME
SAVE MYSQL VARIABLES TO MEMORY;  -- RUNTIME保存至MEM
LOAD MYSQL VARIABLES FROM DISK;  -- DISK加载到MEM
SAVE MYSQL VARIABLES TO DISK;  -- MEM保存至DISK
LOAD MYSQL VARIABLES FROM CONFIG;  -- CFG加载到MEM

 

需要注意:只有load到runtime状态时才会验证配置,在保存到mem或disk时,都不会发生任何警告或错误;

当load到runtime时,如果出现了错误信息,将恢复为之前保存的状态,这时可以根据错误日志信息做检查;

总结:日常配置过程大部分时间是在mem中进行配置,然后load到runtime,或者save到disk中,对于cfg很少使用;

ProxySQL基于SQL语句进行读写分离实验配置

 

操作说明涉及数据表信息涉及操作信息
设置从库只读模式   read_only=1
添加主机组信息 mysql_replication_hostgroups  
添加主机组节点信息 mysql_servers  
添加用户信息(监控用户 应用用户) global_variables
mysql_users
 
添加读写分离规则 mysql_query_rules  

在mysql_replication_hostgroups表中配置读写组编号

proxySQL会根据server的read only的取值将服务器进行分组:

  • read_only=0的server,即master会被分到编号为10的写组
  • read_only=1的server,即slave会被分到编号为20的读组;(所以需要将从库设置:set global read_only=1
mysql> insert into mysql_replication_hostgroups(writer_hostgroup,reader_hostgroup,comment) values(10,20,'proxy');
mysql> select * from mysql_replication_hostgroups;
+------------------+------------------+------------+---------+
| writer_hostgroup | reader_hostgroup | check_type | comment |
+------------------+------------------+------------+---------+
| 10               | 20               | read_only  | proxy   |
+------------------+------------------+------------+---------+
    writer_hostgroup 定义写操作的主机组编号;
    reader_hostgroup 定义读操作的主机组编号;
    check_type 指定用于检测服务器读写状态的检查类型;
    comment 用于添加该配置的备注说明

mysql> save mysql servers to disk;
mysql> load mysql servers to runtime;

 

添加主机到ProxySQL

mysql> insert into mysql_servers(hostgroup_id,hostname,port) values(10,'10.0.0.50',3306);
mysql> insert into mysql_servers(hostgroup_id,hostname,port) values(20,'10.0.0.52',3306);
mysql> insert into mysql_servers(hostgroup_id,hostname,port) values(20,'10.0.0.53',3306);
mysql> select * from mysql_servers;
+--------------+-----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| hostgroup_id | hostname  | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment |
+--------------+-----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
| 10           | 10.0.0.50 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.0.0.52 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
| 20           | 10.0.0.53 | 3306 | 0         | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              |         |
+--------------+-----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------+
mysql> save mysql servers to disk;
mysql> load mysql servers to runtime;

 

创建监控用户,并开启监控

利用监控用户对后端节点的运行情况进行监控数据同步,一旦后端节点出现数据同步异常,就不要再向故障节点发送相应业务请求;

# 主库创建监控用户或者在proxysql中修改variables表配置信息
  db01 [(none)]> create user monitor@'%' identified with mysql_native_password by 'monitor';
  db01 [(none)]> grant replication client on *.* to monitor@'%';
  # 修改变量方法一
  db03 [(none)]> set mysql-monitor_username='monitor';
  db03 [(none)]> set mysql-monitor_password='monitor';
  # 修改变量方法二
  db03 [(none)]>update global_variables set variable_value='monitor' where variable_name='mysql-monitor_username';
  db03 [(none)]>update global_variables set variable_value='monitor' where variable_name='mysql-monitor_password';
  
db03 [(none)]>load mysql variables to runtime;
db03 [(none)]>save mysql variables to disk;

 

 

 

db03 [(none)]>select * from mysql_server_connect_log;
  -- 这张表记录了所有尝试连接MySQL服务器的成功和失败信息
+-----------+------+------------------+-------------------------+---------------+
| hostname | port | time_start_us | connect_success_time_us | connect_error |
+-----------+------+------------------+-------------------------+---------------+
| 10.0.0.50 | 3306 | 1759114335172474 | 2147 | NULL |
| 10.0.0.52 | 3306 | 1759114335911205 | 2047 | NULL |
| 10.0.0.53 | 3306 | 1759114336649926 | 773 | NULL |
+-----------+------+------------------+-------------------------+---------------+

db03 [(none)]> select * from mysql_server_ping_log;
-- 这张表专门用于记录对集群中每个成员服务器的连通性检查结果
+-----------+------+------------------+----------------------+----------------------------------------------+
| hostname | port | time_start_us | ping_success_time_us | ping_error |
+-----------+------+------------------+----------------------+----------------------------------------------+
| 10.0.0.53 | 3306 | 1759114514920244 | 438 | NULL |
| 10.0.0.52 | 3306 | 1759114514919868 | 916 | NULL |
| 10.0.0.50 | 3306 | 1759114514920164 | 651 | NULL |
+-----------+------+------------------+----------------------+----------------------------------------------+

db03 [(none)]> select * from mysql_server_read_only_log;
-- 这张表记录只复制集群中每个成员read_only状态变化(获取主库或从库主机信息)
+-----------+------+------------------+-----------------+-----------+-------+
| hostname  | port | time_start_us    | success_time_us | read_only | error |
+-----------+------+------------------+-----------------+-----------+-------+
| 10.0.0.53 | 3306 | 1759114650009963 | 874             | 1         | NULL  |
| 10.0.0.52 | 3306 | 1759114650010060 | 839             | 1         | NULL  |
| 10.0.0.50 | 3306 | 1759114650009842 | 1091            | 0         | NULL  |
+-----------+------+------------------+-----------------+-----------+-------+

db03 [(none)]> select * from mysql_server_replication_lag_log;
Empty set (0.00 sec)
-- 这张表记录组复制集群中每个成员上观察到的复制延迟

 创建应用用户信息

创建数据库应用用户信息,利用应用用户,可以使proxySQL进行数据库节点的操作管理;

mysql_users:该表用于管理通过ProxySQL连接后端MySQL服务器的用户凭证和基本路由信息

# 主库创建应用用户
db01 [(none)]> create user root@'%' identified with mysql_native_password by 'root';
db01 [(none)]> grant all on *.* to root@'%';
​
# 在proxysql中添加数据库节点的管理用户信息
db03 [(none)]>insert into mysql_users(username,password,default_hostgroup) \
values('root','root',10); -- username ​ db03 [(none)]>load mysql users to runtime; Query OK, 0 rows affected (0.00 sec) ​ db03 [(none)]>save mysql users to disk; Query OK, 0 rows affected (0.00 sec) ​ # 早期版本,需要开启事务的持续化(忽略) update mysql_users set transaction_persistent=1 where username='root'; load mysql users to runtime; save mysql users to disk; -- 事务路由分配持续性,同一个事务的语句不会被分配到不同的组

实用的读写规则配置

> insert into mysql_query_rules(rule_id,active,match_pattern,destination_hostgroup,apply) values (1,1,'^select.*for update$',10,1);
> insert into mysql_query_rules(rule_id,active,match_pattern,destination_hostgroup,apply) values (2,1,'^select',20,1);
-- 其余数据库操作语句信息,默认路由放置到主节点进行执行
> load mysql query rules to runtime;
> save mysql query rules to disk;

select ... for update规则的rule_id必须要小于普通的select规则的rule_id,proxySQL是根据rule_id的顺序进行规则匹配的;

测试读写分离效果

[root@xiaoQ-03 ~]# mysql -uroot -p123 -P6033 -h127.0.0.1 -e "begin;select @@server_id;commit"
+-----------------+
| @@server_id |
+-----------------+
|                   51 |
+-----------------+
-- 非查询操作走的是主节点
[root@xiaoQ-03 ~]# mysql -uroot -p123 -P6033 -h127.0.0.1 -e "select @@server_id;"
+-----------------+
| @@server_id |
+-----------------+
|                   52 |
+-----------------+
[root@xiaoQ-03 ~]# mysql -uroot -p123 -P6033 -h127.0.0.1 -e "select @@server_id;"
+-----------------+
| @@server_id |
+-----------------+
|                    53 |
+-----------------+
-- 查询操作走的是从节点
>select * from stats_mysql_query_digest\G
-- 这个表对于分析SQL语句至关重要,是分析语句性能、定制路由规则指标的最主要来源

 

 

  读写分离架构软件配置扩展

 基于端口进行读写分离路由

# 修改proxySQL监听SQL流量的端口号,监听多端口信息
db03 [(none)]> set mysql-interfaces='0.0.0.0:6033;0.0.0.0:6034'
​
# 使监听端口配置信息生效
db03 [(none)]> save mysql variables to disk;
[root@db03 ~]# systemctl restart proxysql
​
# 设定相应读写分离路由规则
db03 [(none)]> delete from mysql_query_rules;  -- 为了测试效果,先清空已有规则信息
db03 [(none)]> insert into mysql_query_rules(rule_id,active,proxy_port,\
destination_hostgroup,apply) values(1,1,6033,10,1),(2,1,6034,20,1);
db03 [(none)]> load mysql query rules to runtime;
db03 [(none)]> save mysql query rules to disk;
-- 除了基于端口进行分离,还可以基于监听地址(修改字段proxy_addr即可),也可以基于客户端地址(修改字段client_addr字段即可)

 基于用户进行读写分离路由

db03 [(none)]> insert into mysql_users(username,password,default_hostgroup) \
values ('write','123',10),('reader','123',20);
db03 [(none)]> load mysql users to runtime;
db03 [(none)]> save mysql users to disk;
​
db03 [(none)]> delete from mysql_query_rules;  -- 为了测试效果,先清空已有规则信息
db03 [(none)]> insert into mysql_query_rules(rule_id,active,username,\
destination_hostgroup,apply) values (1,1,'write',10,1),(2,1,'reader',20,1);
db03 [(none)]> load mysql users to runtime;
db03 [(none)]> save mysql users to disk;

MIC(数据库高可用/读写分离集群)

MySQL8.0之后解决高可用、主从同步、读写分离原生方案

数据库服务架构集群概述

1674267901595

MySQLInnoDB集群提供了一个集成的,本地的,HA解决方案:MySQL InnoDB cluster;
MySQLInnoDB cluster是利用组复制的paxos协议,保障数据一致性,组复制支持单主模式和多主模式;
MySQLInnoDB cluster由以下几部分组成:
MySQL Servers with Group Replication:
向集群的所有成员复制数据,同时提供容错,自动故障转移和弹性伸缩;(MySQLServer5.7.17或更高的版本)
MySQL Router:
确保客户端请求是负载均衡的,并在任何数据库故障时,可以路由到正确的服务器;(MySQLRouter2.1.3或更高的版本)
MySQL shell:
通过内置的管理API创建管理Innodb集群,即统一管理MGR节点;(MySQLShell1.0.9或更高的版本)

  数据库服务架构集群构建过程

 构建架构主机规范说明

IP Hostname Host role install software
10.0.0.51 db01 mic-master mysql、mysqlsh
10.0.0.52 db02 mic-slave01 mysql、mysqlsh
10.0.0.53 db03 mic-slave02 mysql、mysqlsh
10.0.0.54 db04 mix-manager mysql、mysqldsh、mysql-router

 安装软件程序

image

配置环境变量

vim /etc/profile

export PATH=/usr/local/mysql/bin:/usr/local/mysqlsh/bin:/usr/local/mysql-router/bin:$PATH

source /etc/profile

将其进行域名解析

vim /etc/hosts

10.0.0.51 db01

10.0.0.52 db02

10.0.0.53 db03

10.0.0.54 db04

数据库初始化

pkill mysqld

rm -rf /data/3306/data/*

image

 

image

将三台主机进行初始化操作(此时还没有配置db04)

添加实例到mysqlsh中,使mysqlsh可以批量管理所有数据库节点

# 设置数据库本地root账户密码信息(所有数据节点均操作)
db01 [(none)]>alter user root@'localhost' identified with mysql_nativee_password by 'root';
db02 [(none)]>alter user root@'localhost' identified with mysql_nativee_password by 'root';
db03 [(none)]>alter user root@'localhost' identified with mysql_nativee_password by 'root';

# 初始化配置集群节点实例(db01\db02\db03数据库节点均操作)
mysqlsh
> shell.connect('root@localhost:3306') ;
  --  这是一个用于连接MySQL服务器实例的MySQL Shell命令,通过root用户连接到本地主机的3306端口
> dba.configureLocalinstance();
  -- 快速配置当前连接的MySQL本地实例,使其满足MySQL Group Replication(组复制)的高可用集群要求

image

 > dba.checkinstanceConfiguration("root@localhost:3306");

检查指定的MySQL实例当前的配置是否符合组建InnoDB Cluster(高可用集群)的要求(显示ok表示综上操作成功)

构建MGR主从架构

在现有数据库集群实例环境中,创建集群并添加节点(配置MGR-在管理节点上进行统一配置操控) 

mysqlsh
> shell.connect('root@10.0.0.51:3306')
MySQL master:3306 ssl JS>  -- 出现这个信息表示已经与主节点数据库服务建立连接
> var cluster=dba.createCluster('群组名')  -- 创建MGR群组
  等价于mgr中的 change master to master_user='repl',master_password='123' for channel '群组名' 
  set global group_replication_bootstrap_group=ON;
  start group_replication;
  set global group_replication_bootstrap_group=OFF;
  他会自动将第一个创建的连接的服务器作为引导节点

cluster.status()  查看群组信息
  等价于MGR   select * from performance_schema.replication_group_members;


# 将从节点均加入到MGR集群中(如果断开了需要从新进入mysqlsh,并重新连接主节点)
> var cluster=dba.getCluster('群组名');   -- 切换到指定的群组
> cluster.addinstance('root@10.0.0.52:3306');  -- 加入指定节点信息到群组中
Please select a recovery method [C]lone/[A]bort(default Abort):C
  是否将主库的数据克隆到从库
> cluster.addinstance('root@10.0.0.53:3306');  -- 加入指定节点信息到群组中
Please select a recovery method [C]lone/[A]bort(default Abort):C
  是否将主库的数据克隆到从库
cluster.status()  查看群组信息

 

构建MIC集群mysql-router中间件

# 注册router到集群架构中,生成myrouter目录,并生成启动程序和配置文件
[root@db04 ~]# cd /usr/local 
[root@db04 ~]# mysqlrouter --bootstrap root@db01:3306 -d myrouter --user=root
会在当前目录下生成对应的目录
Read/Write Connections: localhost:6446
Read/Only Connections: localhost:6447

# 启动mysql-router程序
cd /usr/loca/myrouter/
./start.sh
natstart -lntup |grep mysqld

# 进行router代理程序连接测试
mysqlsh -uri root@localhost:6446
mysql -uroot -proot -h10.0.0.54 -P 6446  -- 读写分离测试
mysql -uroot -proot -h10.0.0.54 -P 6447

image

数据库服务架构集群管理命令

集群节点信息查询命令

# 连接主节点,并进行定义
mysqlsh
> shell.connect('root@10.0.0.51:306');
> var cluster=dba.getCluster('群组名');

# 检查节点是否符合集群标准(是否加入集群)
dba.checklnstanceConfiguration('root@master:3306');

# 集群结构信息描述
cluster.describe();

# 集群状态总览
cluster.status();

# 查看集群名字
dba.getCluster();

# 查看router信息
cluster.listRouters();

 

集群创建管理命令

# 创建集群节点 dba.createCluster()
> var cluster=dba.createCluster('群组名');

# 添加集群节点 Cluster.addlinstance()
> cluster.addinstance('root@slave01:3306');

# 移除集群节点 Cluster.removeinstance()
> cluster.removeinstance('root@slave01:3306');

# 重新识别节点 Cluster.rejoininstance()
> cluster.rejoininstance('root@slave01:3306');
-- 如果实例离开集群(丢失连接并且没有自动重新加入集群),可以通过Cluster.rejoininstance()方法将实例重新加入

# 在线切换实例 Cluster.setPrimaryinstance(instance)
> cluster.setPrimaryinstance('root@slave01:3306');
> cluster.status();
> cluster.setPrimaryinstance('root@master:3306');

# 在线切换扩展
> cluster.switchToMultiPrimaryMode();  -- 切换为多主模式
> cluster.switchToSinglePrimaryMode('root@master:3306');  -- 切换为单主模式
-- 切换为多主模式后
6447端口(默认只读)接收读写,并且可通过该端口访问所有集群成员;
6446端口(默认读写),只能连接到其中一个成员(之前的primary成员)

 

集群节点故障处理操作

dba.rebootClusterFromCompleteOutage();
用于在集群完全断电后重新配置集群,如果以上操作失败,可以通过以下方式删除所有集群元素据

dba.dropMetadateSchema();
删除集群元数据,然后dba.createCluster()重建集群

cluster.removeRouterMetadata(router);
删除已经注册的router

 

posted on 2025-09-23 12:18  猿小姜  阅读(14)  评论(0)    收藏  举报

levels of contents