配置主从同步:
创建用于同步的用户:
create user repl@'%' identified by 'Admin@123';
grant REPLICATION CLIENT,REPLICATION SLAVE on *.* to rep@'%';
创建用户mha连接的用户:
CREATE USER 'mha'@'%' IDENTIFIED WITH mysql_native_password BY 'Admin@123';
GRANT REPLICATION SLAVE ON *.* TO 'mha'@'%';
grant all on *.* to mha;
主库全备恢复到备库环境:
mysqldump -uroot -p"Admin@123" --single-transaction --master-data=2 testdb --flush-logs -R -E | sed -e 's/DEFINE.*PROCEDURE/PROCEDURE/;s/DEFINE.*FUNCTION/FUNCTION/;s/DEFINER.*DEFINER//' |gzip> /tmp/testdb.sql.gz
恢复后同步:
CHANGE MASTER TO MASTER_HOST='10.43.20.100', MASTER_USER='repl', MASTER_PASSWORD='Admin@123', MASTER_PORT=3306, MASTER_LOG_FILE='mysqld-bin.000003', MASTER_LOG_POS=157;
主备同步创建完成后,进行mha的安装部署:
1.配置两条服务器免密:
可使用文末免密脚本进行配置。
2.第一步安装依赖:(以下步骤两节点均需操作,特定情况会说明)
将mysql环境变量写入profile文件
export PATH=/usr/local/mysql/bin:$PATH
配置yum源:
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
yum install -y cpan perl-Config-Tiny epel-release perl-Log-Dispatch perl-Parallel-ForkManager perl-Time-HiRes perl-DBD-MySQL python3-devel gcc
查看依赖安装情况:
rpm -qa cpan \
perl-Config-Tiny \
epel-release \
perl-Log-Dispatch \
perl-Parallel-ForkManager \
perl-Time-HiRes\
perl-DBD-MySQL\
python3-devel\
3.安装python3依赖包
pip3 install psutil requests pymysql -i https://pypi.tuna.tsinghua.edu.cn/simple
4.解压软件到/data目录下(两节点均需操作)
tar -zxvf mha20250113.tar.gz -C /data/
软件下载地址:
https://elanjie.lanzout.com/iB7dA2kvpqvg
5.修改配置文件
目录格式如下:
[root@node02 mha]# ls -rtl
total 4
drwxr-xr-x 8 root root 4096 Jan 13 19:00 mha_soft
drwxr-xr-x 2 root root 41 Jan 13 19:04 conf
drwxr-xr-x 3 root root 230 Jan 13 19:43 ha_monitor
drwxr-xr-x 3 root root 252 Jan 13 23:25 scripts
drwxr-xr-x 2 root root 6 Jan 13 23:28 masterha
drwxr-xr-x 2 root root 54 Jan 14 10:03 manager
5.1 conf文件夹保持为空,利用命令生成配置文件
进入scripts目录下,执行如下命令:
python3 setup_mysql_mhavip.py -V 10.43.20.102 -I 10.43.20.100,10.43.20.101 -P 3306 -S 22 -N 'ens33'
说明:-V 为虚拟ip地址,-I分别为两台服务器地址,主库写前边,备库写后边,-P为数据库端口,-S为ssh端口,-N为网卡名称
脚本生成依赖于,/data/mha/mha_soft下的masterha.cnf和app.cnf文件,如果ssh端口有改变,需修改masterha.cnf文件,将虚拟ip地址,主备服务器地址,ssh端口,网卡名称修改为对应值.
状态输出:
[root@node02 scripts]# python3 setup_mysql_mhavip.py -V 10.43.20.102 -I 10.43.20.100,10.43.20.101 -P 3306 -S 22 -N 'ens33'
2025-01-14 10:15:51 - install mha - INFO - yum install Perl dependencies, perl-Module-Install ...loaded.
2025-01-14 10:15:51 - install mha - INFO - yum install Perl dependencies, perl-Module-Build ...loaded.
2025-01-14 10:15:53 - install mha - INFO - yum install mysql drive, perl-Log-Dispatch ...loaded.
2025-01-14 10:15:53 - install mha - INFO - yum install mysql drive, perl-Parallel-ForkManager ...loaded.
2025-01-14 10:15:53 - install mha - INFO - yum install mysql drive, perl-Config-Tiny ...loaded.
2025-01-14 10:16:03 - install mha - INFO - yum install mysql driver,perl-DBD-MySQL ...loaded.
2025-01-14 10:16:03 - system check - WARNING - /data/mha/masterha tmp already exists
2025-01-14 10:16:03 - install mha - WARNING - mha app node already exists.
2025-01-14 10:16:03 - system check - WARNING - /data/mha/scripts tmp already exists
2025-01-14 10:16:03 - system check - WARNING - /data/mha/conf tmp already exists
2025-01-14 10:16:04 - system check - WARNING - /data/mha/manager tmp already exists
2025-01-14 10:16:04 - install mha - WARNING - mha manager already exists.
['[server default]\n', '#workdir on the management server\n', 'manager_workdir=/data/mha/manager/\n', 'manager_log=/data/mha/manager/manager.log\n', '\n', '#workdir on the node for mysql server\n', 'master_binlog_dir=/data/mysqldata/data\n', '\n', '#检测master可用性\n', 'secondary_check_script=/data/mha/scripts/masterha_secondary_check -s 10.43.20.100 -s 10.43.20.101 --ssh_port=22\n', '#自动故障切换master脚本\n', 'master_ip_failover_script=/data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22\n', '#手动切换master脚本\n', 'master_ip_online_change_script=/data/mha/scripts/master_ip_online_change_vip --vip=10.43.20.102 --ssh_port=22\n', '\n', '[server1]\n', 'hostname=10.43.20.100\n', 'port=3306\n', 'candidate_master=1\n', 'check_repl_delay=0\n', '\n', '[server2]\n', 'hostname=10.43.20.101\n', 'port=3306\n', 'candidate_master=1\n', 'check_repl_delay=0\n']
2025-01-14 10:16:04 - install mha - INFO - complete ok!
提示配置完成,查看conf文件夹下的app.cnf文件,内容如下:
[root@node02 conf]# ls -rtl
total 8
-rw-r--r-- 1 root root 383 Jan 14 10:16 masterha.cnf
-rw-r--r-- 1 root root 772 Jan 14 10:16 app.cnf
检查文件是否正确。
4.2测试联通性:
[root@node02 mha]# masterha_check_ssh --conf=/data/mha/conf/app.cnf
Tue Jan 14 10:45:28 2025 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Tue Jan 14 10:45:28 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Tue Jan 14 10:45:28 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Tue Jan 14 10:45:28 2025 - [info] Starting SSH connection tests..
Tue Jan 14 10:45:30 2025 - [debug]
Tue Jan 14 10:45:29 2025 - [debug] Connecting via SSH from root@10.43.20.101(10.43.20.101:22) to root@10.43.20.100(10.43.20.100:22)..
Tue Jan 14 10:45:30 2025 - [debug] ok.
Tue Jan 14 10:45:30 2025 - [debug]
Tue Jan 14 10:45:28 2025 - [debug] Connecting via SSH from root@10.43.20.100(10.43.20.100:22) to root@10.43.20.101(10.43.20.101:22)..
Tue Jan 14 10:45:30 2025 - [debug] ok.
Tue Jan 14 10:45:30 2025 - [info] All SSH connection tests passed successfully.
[root@node02 mha]#
连通性测试通过。
4.3修改ha_monitor配置
修改:parameter.py,配合HA_Monitor.py,进行mha的监控
[root@node02 mha]# cat ha_monitor/parameter.py
par_var = {"vip": "10.43.20.102", "local_ip1": "10.43.20.100", "local_ip2": "10.43.20.101", "mysql_port": "3306", "netcard": "ens33", "sleeptime": 60}
4.4【备节点】启动manager和monitor
[root@node02 scripts]# cd /data/mha/scripts/
[root@node02 scripts]# ./start.sh
[root@node02 scripts]#
[root@node02 scripts]#
查看日志输出:
[root@node02 manager]# tail -10f manager.log
Tue Jan 14 10:51:59 2025 - [info] Reading default configuration from /data/mha/conf/masterha.cnf..
Tue Jan 14 10:51:59 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Tue Jan 14 10:51:59 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Tue Jan 14 10:51:59 2025 - [info] MHA::MasterMonitor version 0.58.
Tue Jan 14 10:52:01 2025 - [info] GTID failover mode = 0
Tue Jan 14 10:52:01 2025 - [info] Dead Servers:
Tue Jan 14 10:52:01 2025 - [info] Alive Servers:
Tue Jan 14 10:52:01 2025 - [info] 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 10:52:01 2025 - [info] 10.43.20.101(10.43.20.101:3306)
Tue Jan 14 10:52:01 2025 - [info] Alive Slaves:
Tue Jan 14 10:52:01 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled
Tue Jan 14 10:52:01 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 10:52:01 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jan 14 10:52:01 2025 - [info] Current Alive Master: 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 10:52:01 2025 - [info] Checking slave configurations..
Tue Jan 14 10:52:01 2025 - [info] Checking replication filtering settings..
Tue Jan 14 10:52:01 2025 - [info] binlog_do_db= , binlog_ignore_db=
Tue Jan 14 10:52:01 2025 - [info] Replication filtering check ok.
Tue Jan 14 10:52:01 2025 - [info] GTID (with auto-pos) is not supported
Tue Jan 14 10:52:01 2025 - [info] Starting SSH connection tests..
Tue Jan 14 10:52:03 2025 - [info] All SSH connection tests passed successfully.
Tue Jan 14 10:52:03 2025 - [info] Checking MHA Node version..
Tue Jan 14 10:52:03 2025 - [info] Version check ok.
Tue Jan 14 10:52:03 2025 - [info] Checking SSH publickey authentication settings on the current master..
Tue Jan 14 10:52:04 2025 - [info] HealthCheck: SSH to 10.43.20.100 is reachable.
Tue Jan 14 10:52:04 2025 - [info] Master MHA Node version is 0.58.
Tue Jan 14 10:52:04 2025 - [info] Checking recovery script configurations on 10.43.20.100(10.43.20.100:3306)..
Tue Jan 14 10:52:04 2025 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysqldata/data --output_file=/data/mha/masterha/save_binary_logs_test --manager_version=0.58 --start_file=mysqld-bin.000005
Tue Jan 14 10:52:04 2025 - [info] Connecting to root@10.43.20.100(10.43.20.100:22)..
Creating /data/mha/masterha if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysqldata/data, up to mysqld-bin.000005
Tue Jan 14 10:52:05 2025 - [info] Binlog setting check done.
Tue Jan 14 10:52:05 2025 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Tue Jan 14 10:52:05 2025 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mha' --slave_host=10.43.20.101 --slave_ip=10.43.20.101 --slave_port=3306 --workdir=/data/mha/masterha --target_version=8.0.28 --manager_version=0.58 --relay_dir=/data/mysqldata/data --current_relay_log=relay.000002 --slave_pass=xxx
Tue Jan 14 10:52:05 2025 - [info] Connecting to root@10.43.20.101(10.43.20.101:22)..
Checking slave recovery environment settings..
Relay log found at /data/mysqldata/data, up to relay.000002
Temporary relay log file is /data/mysqldata/data/relay.000002
Checking if super_read_only is defined and turned on..Disabling super_read_only, enabling read_only, so that the applying can be done on the slave
Testing mysql connection and privileges..
done.
Enabling super_read_only again..Enabling super_read_only again after applying
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Tue Jan 14 10:52:09 2025 - [info] Slaves settings check done.
Tue Jan 14 10:52:09 2025 - [info]
10.43.20.100(10.43.20.100:3306) (current master)
+--10.43.20.101(10.43.20.101:3306)
Tue Jan 14 10:52:09 2025 - [info] Checking master_ip_failover_script status:
Tue Jan 14 10:52:09 2025 - [info] /data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22 --command=status --ssh_user=root --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306
Tue Jan 14 10:52:09 2025 - [info] OK.
Tue Jan 14 10:52:09 2025 - [warning] shutdown_script is not defined.
Tue Jan 14 10:52:09 2025 - [info] Set master ping interval 2 seconds.
Tue Jan 14 10:52:09 2025 - [info] Set secondary check script: /data/mha/scripts/masterha_secondary_check -s 10.43.20.100 -s 10.43.20.101 --ssh_port=22
Tue Jan 14 10:52:09 2025 - [info] Starting ping health check on 10.43.20.100(10.43.20.100:3306)..
Tue Jan 14 10:52:09 2025 - [info] Ping(CONNECT) succeeded, waiting until MySQL doesn't respond..
[root@node02 ha_monitor]# tail -10f monitor.log
2025-01-14 10:13:15 check manager WARNING 从库的manager进程不存在,正在启动...
2025-01-14 10:13:15 - check manager - WARNING - 从库的manager进程不存在,正在启动...
2025-01-14 10:13:15 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it.
2025-01-14 10:13:15 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it.
2025-01-14 10:14:15 check_mysql INFO mysqld is run.
2025-01-14 10:14:15 - check_mysql - INFO - mysqld is run.
2025-01-14 10:14:15 read only INFO it's a slave,read only mode is on, keep it.
2025-01-14 10:14:15 - read only - INFO - it's a slave,read only mode is on, keep it.
2025-01-14 10:14:15 ha_monitor INFO manager is not run.
2025-01-14 10:14:15 - ha_monitor - INFO - manager is not run.
2025-01-14 10:14:16 check manager WARNING 从库的manager进程不存在,正在启动...
2025-01-14 10:14:16 - check manager - WARNING - 从库的manager进程不存在,正在启动...
2025-01-14 10:14:16 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it.
2025-01-14 10:14:16 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it.
2025-01-14 10:51:59 check_mysql INFO mysqld is run.
2025-01-14 10:51:59 - check_mysql - INFO - mysqld is run.
2025-01-14 10:51:59 read only INFO it's a slave,read only mode is on, keep it.
2025-01-14 10:51:59 - read only - INFO - it's a slave,read only mode is on, keep it.
2025-01-14 10:51:59 ha_monitor INFO manager is run.
2025-01-14 10:51:59 - ha_monitor - INFO - manager is run.
2025-01-14 10:51:59 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it.
2025-01-14 10:51:59 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it.
2025-01-14 10:52:59 check_mysql INFO mysqld is run.
2025-01-14 10:52:59 - check_mysql - INFO - mysqld is run.
2025-01-14 10:53:00 read only INFO it's a slave,read only mode is on, keep it.
2025-01-14 10:53:00 - read only - INFO - it's a slave,read only mode is on, keep it.
2025-01-14 10:53:00 ha_monitor INFO manager is run.
2025-01-14 10:53:00 - ha_monitor - INFO - manager is run.
2025-01-14 10:53:00 check vip INFO current host is slave,vip(10.43.20.102) does not exist,keep it.
2025-01-14 10:53:00 - check vip - INFO - current host is slave,vip(10.43.20.102) does not exist,keep it.
4.5【主节点】启动monitor
[root@node01 scripts]# cd /data/mha/scripts/
[root@node01 scripts]# ./start.sh
[root@node01 scripts]# ./start.sh
[root@node01 scripts]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:77:5d:9b brd ff:ff:ff:ff:ff:ff
inet 10.43.20.100/24 brd 10.43.20.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 10.43.20.102/24 scope global secondary ens33
valid_lft forever preferred_lft forever
inet6 fe80::6e7:5185:9e7b:c840/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::79c0:be13:f42:d85e/64 scope link noprefixroute
valid_lft forever preferred_lft forever
主节点vip已自动绑定。
4.6两节点添加crontab监控
* * * * * if [[ ` ps -ef|grep -i ha_monitor.py|grep -v grep|wc -l ` -ne 1 ]];then /bin/nohup /bin/python3 /data/mha/ha_monitor/HA_Monitor.py >> /data/mha/ha_monitor/monitor.log 2>&1 & fi
4.7测试主备切换
备节点执行:
masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=10.43.20.101 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1
完整日志输出如下:
[root@node02 scripts]# masterha_master_switch --global_conf=/data/mha/conf/masterha.cnf --conf=/data/mha/conf/app.cnf --master_state=alive --new_master_host=10.43.20.101 --new_master_port=3306 --orig_master_is_new_slave --running_updates_limit=1
Tue Jan 14 10:57:18 2025 - [info] MHA::MasterRotate version 0.58.
Tue Jan 14 10:57:18 2025 - [info] Starting online master switch..
Tue Jan 14 10:57:18 2025 - [info]
Tue Jan 14 10:57:18 2025 - [info] * Phase 1: Configuration Check Phase..
Tue Jan 14 10:57:18 2025 - [info]
Tue Jan 14 10:57:18 2025 - [info] Reading default configuration from /data/mha/conf/masterha.cnf..
Tue Jan 14 10:57:18 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Tue Jan 14 10:57:18 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Tue Jan 14 10:57:20 2025 - [info] GTID failover mode = 0
Tue Jan 14 10:57:20 2025 - [info] Current Alive Master: 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 10:57:20 2025 - [info] Alive Slaves:
Tue Jan 14 10:57:20 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled
Tue Jan 14 10:57:20 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 10:57:20 2025 - [info] Primary candidate for the new Master (candidate_master is set)
It is better to execute FLUSH NO_WRITE_TO_BINLOG TABLES on the master before switching. Is it ok to execute on 10.43.20.100(10.43.20.100:3306)? (YES/no): yes
Tue Jan 14 10:57:22 2025 - [info] Executing FLUSH NO_WRITE_TO_BINLOG TABLES. This may take long time..
Tue Jan 14 10:57:22 2025 - [info] ok.
Tue Jan 14 10:57:22 2025 - [info] Checking MHA is not monitoring or doing failover..
Tue Jan 14 10:57:22 2025 - [info] Checking replication health on 10.43.20.101..
Tue Jan 14 10:57:22 2025 - [info] ok.
Tue Jan 14 10:57:22 2025 - [info] 10.43.20.101 can be new master.
Tue Jan 14 10:57:22 2025 - [info]
From:
10.43.20.100(10.43.20.100:3306) (current master)
+--10.43.20.101(10.43.20.101:3306)
To:
10.43.20.101(10.43.20.101:3306) (new master)
+--10.43.20.100(10.43.20.100:3306)
Starting master switch from 10.43.20.100(10.43.20.100:3306) to 10.43.20.101(10.43.20.101:3306)? (yes/NO): yes
Tue Jan 14 10:57:23 2025 - [info] Checking whether 10.43.20.101(10.43.20.101:3306) is ok for the new master..
Tue Jan 14 10:57:23 2025 - [info] ok.
Tue Jan 14 10:57:23 2025 - [info] 10.43.20.100(10.43.20.100:3306): SHOW SLAVE STATUS returned empty result. To check replication filtering rules, temporarily executing CHANGE MASTER to a dummy host.
Tue Jan 14 10:57:24 2025 - [info] 10.43.20.100(10.43.20.100:3306): Resetting slave pointing to the dummy host.
Tue Jan 14 10:57:24 2025 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jan 14 10:57:24 2025 - [info]
Tue Jan 14 10:57:24 2025 - [info] * Phase 2: Rejecting updates Phase..
Tue Jan 14 10:57:24 2025 - [info]
Tue Jan 14 10:57:24 2025 - [info] Executing master ip online change script to disable write on the current master:
Tue Jan 14 10:57:24 2025 - [info] /data/mha/scripts/master_ip_online_change_vip --vip=10.43.20.102 --ssh_port=22 --command=stop --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=10.43.20.101 --new_master_ip=10.43.20.101 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Tue Jan 14 10:57:24 2025 165131 Set read_only on the new master.. ok.
Tue Jan 14 10:57:24 2025 172860 Drpping app user on the orig master..
Tue Jan 14 10:57:24 2025 173679 Waiting all running 1 threads are disconnected.. (max 1500 milliseconds)
{'Time' => '55381','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Tue Jan 14 10:57:24 2025 676107 Waiting all running 1 threads are disconnected.. (max 1000 milliseconds)
{'Time' => '55381','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Tue Jan 14 10:57:25 2025 175901 Waiting all running 1 threads are disconnected.. (max 500 milliseconds)
{'Time' => '55382','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Tue Jan 14 10:57:25 2025 676904 Set read_only=1 on the orig master.. ok.
Tue Jan 14 10:57:25 2025 679971 Waiting all running 1 queries are disconnected.. (max 500 milliseconds)
{'Time' => '55382','db' => undef,'Id' => '10','User' => 'event_scheduler','State' => 'Waiting on empty queue','Command' => 'Daemon','Info' => undef,'Host' => 'localhost'}
Tue Jan 14 10:57:26 2025 178121 Killing all application threads..
Tue Jan 14 10:57:26 2025 179579 done.
Disabling the VIP on old master: 10.43.20.100
Tue Jan 14 10:57:26 2025 - [info] ok.
Tue Jan 14 10:57:26 2025 - [info] Locking all tables on the orig master to reject updates from everybody (including root):
Tue Jan 14 10:57:26 2025 - [info] Executing FLUSH TABLES WITH READ LOCK..
Tue Jan 14 10:57:26 2025 - [info] ok.
Tue Jan 14 10:57:26 2025 - [info] Orig master binlog:pos is mysqld-bin.000005:14752736.
Tue Jan 14 10:57:26 2025 - [info] Waiting to execute all relay logs on 10.43.20.101(10.43.20.101:3306)..
Tue Jan 14 10:57:26 2025 - [info] master_pos_wait(mysqld-bin.000005:14752736) completed on 10.43.20.101(10.43.20.101:3306). Executed 0 events.
Tue Jan 14 10:57:26 2025 - [info] done.
Tue Jan 14 10:57:26 2025 - [info] Getting new master's binlog name and position..
Tue Jan 14 10:57:26 2025 - [info] mysqld-bin.000003:14752781
Tue Jan 14 10:57:26 2025 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.43.20.101', MASTER_PORT=3306, MASTER_LOG_FILE='mysqld-bin.000003', MASTER_LOG_POS=14752781, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Jan 14 10:57:26 2025 - [info] Executing master ip online change script to allow write on the new master:
Tue Jan 14 10:57:26 2025 - [info] /data/mha/scripts/master_ip_online_change_vip --vip=10.43.20.102 --ssh_port=22 --command=start --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --orig_master_user='mha' --new_master_host=10.43.20.101 --new_master_ip=10.43.20.101 --new_master_port=3306 --new_master_user='mha' --orig_master_ssh_user=root --new_master_ssh_user=root --orig_master_is_new_slave --orig_master_password=xxx --new_master_password=xxx
Enabling the VIP - 10.43.20.102 on the new master - 10.43.20.101
Tue Jan 14 10:57:27 2025 053683 Set read_only=0 on the new master.
Tue Jan 14 10:57:27 2025 054082 Creating app user on the new master..
Tue Jan 14 10:57:27 2025 - [info] ok.
Tue Jan 14 10:57:27 2025 - [info]
Tue Jan 14 10:57:27 2025 - [info] * Switching slaves in parallel..
Tue Jan 14 10:57:27 2025 - [info]
Tue Jan 14 10:57:27 2025 - [info] Unlocking all tables on the orig master:
Tue Jan 14 10:57:27 2025 - [info] Executing UNLOCK TABLES..
Tue Jan 14 10:57:27 2025 - [info] ok.
Tue Jan 14 10:57:27 2025 - [info] Starting orig master as a new slave..
Tue Jan 14 10:57:27 2025 - [info] Resetting slave 10.43.20.100(10.43.20.100:3306) and starting replication from the new master 10.43.20.101(10.43.20.101:3306)..
Tue Jan 14 10:57:27 2025 - [info] Executed CHANGE MASTER.
Tue Jan 14 10:57:28 2025 - [info] Slave started.
Tue Jan 14 10:57:28 2025 - [info] All new slave servers switched successfully.
Tue Jan 14 10:57:28 2025 - [info]
Tue Jan 14 10:57:28 2025 - [info] * Phase 5: New master cleanup phase..
Tue Jan 14 10:57:28 2025 - [info]
Tue Jan 14 10:57:28 2025 - [info] 10.43.20.101: Resetting slave info succeeded.
Tue Jan 14 10:57:28 2025 - [info] Switching master to 10.43.20.101(10.43.20.101:3306) completed successfully.
查看备节点ip:
[root@node02 scripts]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:eb:6b:8f brd ff:ff:ff:ff:ff:ff
inet 10.43.20.101/24 brd 10.43.20.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet 10.43.20.102/24 scope global secondary ens33
valid_lft forever preferred_lft forever
inet6 fe80::6e7:5185:9e7b:c840/64 scope link noprefixroute
valid_lft forever preferred_lft forever
VIP已绑定,主备切换成功。
[root@localhost][(none)]> show slave status\G
Empty set, 1 warning (0.00 sec)
备节点已没有slave状态。
[root@localhost][(none)]> show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID |
+-----------+------+------+-----------+--------------------------------------+
| 100 | | 3306 | 101 | 283a32ac-d194-11ef-91ab-000c29775d9b |
+-----------+------+------+-----------+--------------------------------------+
1 row in set, 1 warning (0.03 sec)
100原主库已成为备节点。
4.8模拟101主库宕机
[root@node01 scripts]# pkill -u mysql
[root@node01 scripts]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:77:5d:9b brd ff:ff:ff:ff:ff:ff
inet 10.43.20.100/24 brd 10.43.20.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::6e7:5185:9e7b:c840/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::79c0:be13:f42:d85e/64 scope link noprefixroute
valid_lft forever preferred_lft forever
manager日志输出:
Tue Jan 14 11:36:36 2025 - [info] Master is not reachable from all other monitoring servers. Failover should start.
Tue Jan 14 11:36:36 2025 - [info] HealthCheck: SSH to 10.43.20.100 is reachable.
Tue Jan 14 11:36:38 2025 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.43.20.100' (111))
Tue Jan 14 11:36:38 2025 - [warning] Connection failed 2 time(s)..
Tue Jan 14 11:36:40 2025 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.43.20.100' (111))
Tue Jan 14 11:36:40 2025 - [warning] Connection failed 3 time(s)..
Tue Jan 14 11:36:42 2025 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.43.20.100' (111))
Tue Jan 14 11:36:42 2025 - [warning] Connection failed 4 time(s)..
Tue Jan 14 11:36:42 2025 - [warning] Master is not reachable from health checker!
Tue Jan 14 11:36:42 2025 - [warning] Master 10.43.20.100(10.43.20.100:3306) is not reachable!
Tue Jan 14 11:36:42 2025 - [warning] SSH is reachable.
Tue Jan 14 11:36:42 2025 - [info] Connecting to a master server failed. Reading configuration file /data/mha/conf/masterha.cnf and /data/mha/conf/app.cnf again, and trying to connect to all servers to check server status..
Tue Jan 14 11:36:42 2025 - [info] Reading default configuration from /data/mha/conf/masterha.cnf..
Tue Jan 14 11:36:42 2025 - [info] Reading application default configuration from /data/mha/conf/app.cnf..
Tue Jan 14 11:36:42 2025 - [info] Reading server configuration from /data/mha/conf/app.cnf..
Tue Jan 14 11:36:43 2025 - [info] GTID failover mode = 0
Tue Jan 14 11:36:43 2025 - [info] Dead Servers:
Tue Jan 14 11:36:43 2025 - [info] 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 11:36:43 2025 - [info] Alive Servers:
Tue Jan 14 11:36:43 2025 - [info] 10.43.20.101(10.43.20.101:3306)
Tue Jan 14 11:36:43 2025 - [info] Alive Slaves:
Tue Jan 14 11:36:43 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled
Tue Jan 14 11:36:43 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 11:36:43 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jan 14 11:36:43 2025 - [info] Checking slave configurations..
Tue Jan 14 11:36:43 2025 - [info] Checking replication filtering settings..
Tue Jan 14 11:36:43 2025 - [info] Replication filtering check ok.
Tue Jan 14 11:36:43 2025 - [info] Master is down!
Tue Jan 14 11:36:43 2025 - [info] Terminating monitoring script.
Tue Jan 14 11:36:43 2025 - [info] Got exit code 20 (Master dead).
Tue Jan 14 11:36:43 2025 - [info] MHA::MasterFailover version 0.58.
Tue Jan 14 11:36:43 2025 - [info] Starting master failover.
Tue Jan 14 11:36:43 2025 - [info]
Tue Jan 14 11:36:43 2025 - [info] * Phase 1: Configuration Check Phase..
Tue Jan 14 11:36:43 2025 - [info]
Tue Jan 14 11:36:44 2025 - [info] GTID failover mode = 0
Tue Jan 14 11:36:44 2025 - [info] Dead Servers:
Tue Jan 14 11:36:44 2025 - [info] 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 11:36:44 2025 - [info] Checking master reachability via MySQL(double check)...
Tue Jan 14 11:36:44 2025 - [info] ok.
Tue Jan 14 11:36:44 2025 - [info] Alive Servers:
Tue Jan 14 11:36:44 2025 - [info] 10.43.20.101(10.43.20.101:3306)
Tue Jan 14 11:36:44 2025 - [info] Alive Slaves:
Tue Jan 14 11:36:44 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled
Tue Jan 14 11:36:44 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 11:36:44 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jan 14 11:36:44 2025 - [info] Starting Non-GTID based failover.
Tue Jan 14 11:36:44 2025 - [info]
Tue Jan 14 11:36:44 2025 - [info] ** Phase 1: Configuration Check Phase completed.
Tue Jan 14 11:36:44 2025 - [info]
Tue Jan 14 11:36:44 2025 - [info] * Phase 2: Dead Master Shutdown Phase..
Tue Jan 14 11:36:44 2025 - [info]
Tue Jan 14 11:36:44 2025 - [info] Forcing shutdown so that applications never connect to the current master..
Tue Jan 14 11:36:44 2025 - [info] Executing master IP deactivation script:
Tue Jan 14 11:36:44 2025 - [info] /data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22 --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --command=stopssh --ssh_user=root
Disabling the VIP on old master: 10.43.20.100
Tue Jan 14 11:36:45 2025 - [info] done.
Tue Jan 14 11:36:45 2025 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Tue Jan 14 11:36:45 2025 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] * Phase 3: Master Recovery Phase..
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] The latest binary log file/position on all slaves is mysqld-bin.000005:14752736
Tue Jan 14 11:36:45 2025 - [info] Latest slaves (Slaves that received relay log files to the latest):
Tue Jan 14 11:36:45 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled
Tue Jan 14 11:36:45 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 11:36:45 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jan 14 11:36:45 2025 - [info] The oldest binary log file/position on all slaves is mysqld-bin.000005:14752736
Tue Jan 14 11:36:45 2025 - [info] Oldest slaves:
Tue Jan 14 11:36:45 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled
Tue Jan 14 11:36:45 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 11:36:45 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] Fetching dead master's binary logs..
Tue Jan 14 11:36:45 2025 - [info] Executing command on the dead master 10.43.20.100(10.43.20.100:3306): save_binary_logs --command=save --start_file=mysqld-bin.000005 --start_pos=14752736 --binlog_dir=/data/mysqldata/data --output_file=/data/mha/masterha/saved_master_binlog_from_10.43.20.100_3306_20250114113643.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
Creating /data/mha/masterha if not exists.. ok.
Concat binary/relay logs from mysqld-bin.000005 pos 14752736 to mysqld-bin.000005 EOF into /data/mha/masterha/saved_master_binlog_from_10.43.20.100_3306_20250114113643.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 157.. ok.
No need to dump effective binlog data from /data/mysqldata/data/mysqld-bin.000005 (pos starts 14752736, filesize 14752736). Skipping.
ERROR: Error in Log_event::read_log_event(): 'Found invalid event in binary log', data_len: 31, event_type: 35
Binlog Checksum enabled
/data/mha/masterha/saved_master_binlog_from_10.43.20.100_3306_20250114113643.binlog has no effective data events.
Event not exists.
Tue Jan 14 11:36:45 2025 - [info] Additional events were not found from the orig master. No need to save.
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] * Phase 3.3: Determining New Master Phase..
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Tue Jan 14 11:36:45 2025 - [info] All slaves received relay logs to the same position. No need to resync each other.
Tue Jan 14 11:36:45 2025 - [info] Searching new master from slaves..
Tue Jan 14 11:36:45 2025 - [info] Candidate masters from the configuration file:
Tue Jan 14 11:36:45 2025 - [info] 10.43.20.101(10.43.20.101:3306) Version=8.0.28 (oldest major version between slaves) log-bin:enabled
Tue Jan 14 11:36:45 2025 - [info] Replicating from 10.43.20.100(10.43.20.100:3306)
Tue Jan 14 11:36:45 2025 - [info] Primary candidate for the new Master (candidate_master is set)
Tue Jan 14 11:36:45 2025 - [info] Non-candidate masters:
Tue Jan 14 11:36:45 2025 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Tue Jan 14 11:36:45 2025 - [info] New master is 10.43.20.101(10.43.20.101:3306)
Tue Jan 14 11:36:45 2025 - [info] Starting master failover..
Tue Jan 14 11:36:45 2025 - [info]
From:
10.43.20.100(10.43.20.100:3306) (current master)
+--10.43.20.101(10.43.20.101:3306)
To:
10.43.20.101(10.43.20.101:3306) (new master)
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] * Phase 3.5: Master Log Apply Phase..
Tue Jan 14 11:36:45 2025 - [info]
Tue Jan 14 11:36:45 2025 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Tue Jan 14 11:36:45 2025 - [info] Starting recovery on 10.43.20.101(10.43.20.101:3306)..
Tue Jan 14 11:36:45 2025 - [info] This server has all relay logs. Waiting all logs to be applied..
Tue Jan 14 11:36:45 2025 - [info] done.
Tue Jan 14 11:36:45 2025 - [info] All relay logs were successfully applied.
Tue Jan 14 11:36:45 2025 - [info] Getting new master's binlog name and position..
Tue Jan 14 11:36:45 2025 - [info] mysqld-bin.000007:157
Tue Jan 14 11:36:45 2025 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.43.20.101', MASTER_PORT=3306, MASTER_LOG_FILE='mysqld-bin.000007', MASTER_LOG_POS=157, MASTER_USER='repl', MASTER_PASSWORD='xxx';
Tue Jan 14 11:36:45 2025 - [info] Executing master IP activate script:
Tue Jan 14 11:36:45 2025 - [info] /data/mha/scripts/master_ip_failover_vip --vip=10.43.20.102 --ssh_port=22 --command=start --ssh_user=root --orig_master_host=10.43.20.100 --orig_master_ip=10.43.20.100 --orig_master_port=3306 --new_master_host=10.43.20.101 --new_master_ip=10.43.20.101 --new_master_port=3306 --new_master_user='mha' --new_master_password=xxx
Enabling the VIP - 10.43.20.102 on the new master - 10.43.20.101
Set read_only=0 on the new master.
Creating app user on the new master..
Tue Jan 14 11:36:46 2025 - [info] OK.
Tue Jan 14 11:36:46 2025 - [info] ** Finished master recovery successfully.
Tue Jan 14 11:36:46 2025 - [info] * Phase 3: Master Recovery Phase completed.
Tue Jan 14 11:36:46 2025 - [info]
Tue Jan 14 11:36:46 2025 - [info] * Phase 4: Slaves Recovery Phase..
Tue Jan 14 11:36:46 2025 - [info]
Tue Jan 14 11:36:46 2025 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Tue Jan 14 11:36:46 2025 - [info]
Tue Jan 14 11:36:46 2025 - [info] Generating relay diff files from the latest slave succeeded.
Tue Jan 14 11:36:46 2025 - [info]
Tue Jan 14 11:36:46 2025 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Tue Jan 14 11:36:46 2025 - [info]
Tue Jan 14 11:36:46 2025 - [info] All new slave servers recovered successfully.
Tue Jan 14 11:36:46 2025 - [info]
Tue Jan 14 11:36:46 2025 - [info] * Phase 5: New master cleanup phase..
Tue Jan 14 11:36:46 2025 - [info]
Tue Jan 14 11:36:46 2025 - [info] Resetting slave info on the new master..
Tue Jan 14 11:36:46 2025 - [info] 10.43.20.101: Resetting slave info succeeded.
Tue Jan 14 11:36:46 2025 - [info] Master failover to 10.43.20.101(10.43.20.101:3306) completed successfully.
Tue Jan 14 11:36:46 2025 - [info]
----- Failover Report -----
app: MySQL Master failover 10.43.20.100(10.43.20.100:3306) to 10.43.20.101(10.43.20.101:3306) succeeded
Master 10.43.20.100(10.43.20.100:3306) is down!
Check MHA Manager logs at node02:/data/mha/manager/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 10.43.20.100(10.43.20.100:3306)
The latest slave 10.43.20.101(10.43.20.101:3306) has all relay logs for recovery.
Selected 10.43.20.101(10.43.20.101:3306) as a new master.
10.43.20.101(10.43.20.101:3306): OK: Applying all logs succeeded.
10.43.20.101(10.43.20.101:3306): OK: Activated master IP address.
Generating relay diff files from the latest slave succeeded.
10.43.20.101(10.43.20.101:3306): Resetting slave info succeeded.
Master failover to 10.43.20.101(10.43.20.101:3306) completed successfully.
免密脚本:
#!/usr/bin/env bash
## Config to do
nodes="node11,node12,node13,node14,node15" # 可以在这里添加其他节点,用逗号隔开
user=test
## Please Don't edit content below
ssh-keygen -q -P "" -f $HOME/.ssh/id_rsa > /dev/null
# 将节点列表转换为数组
nodes_array=(${nodes//,/ })
# 定义一个函数来处理每个节点
setup_ssh_trust() {
local node=$1
if [ "`hostname`" == "$node" ]; then
ssh-copy-id -o StrictHostKeyChecking=no $user@$node > /dev/null
else
ssh-copy-id -o StrictHostKeyChecking=no $user@$node > /dev/null
ssh $node 'ssh-keygen -q -P "" -f $HOME/.ssh/id_rsa' > /dev/null
scp -rp $node:$HOME/.ssh/id_rsa.pub ./auth.$node > /dev/null
fi
}
# 处理所有节点
for node in "${nodes_array[@]}"
do
setup_ssh_trust $node
done
# 合并所有公钥到 authorized_keys
cat ./auth.* >> $HOME/.ssh/authorized_keys
rm -rf ./auth.*
echo "Transfer authorized_keys"
# 将 authorized_keys 和 known_hosts 传输到所有节点
for node in "${nodes_array[@]}"
do
if [ "`hostname`" != "$node" ]; then
scp -rp $HOME/.ssh/authorized_keys $user@$node:$HOME/.ssh/authorized_keys
scp -rp $HOME/.ssh/known_hosts $user@$node:$HOME/.ssh/known_hosts
fi
done
exit 0