MySQL高可用
任务一 MHA高可用
1.1 MHA介绍
MHA(Master High Availability)目前在 MySQL 高可用方面是一个相对成熟的解决方案,它由日本 DeNA 公司的 youshimaton(现就职于 Facebook 公司)开发,是一套优秀的作为 MySQL 高可用性环境下故障切换和主从提升的高可用软件。MHA只负责MySQL主库的高可用。主库发生故障时,MHA会选择一个数据最接近原主库的候选节点作为新的主节点,并补齐和之前Dead Master差异的Binlog。数据补齐之后,即将写VIP漂移到新主库上对外提供服务。
1. 工作原理
从宕机奔溃的master保存二进制日志事件;
识别含有最新的slave;
应用差异的中继日志(relay log)到其他的slave;
提升一个slave为新的master;
使其他的slave连接新的master进行复制
mysql> create database cool; Query OK, 1 row affected (0.00 sec) mysql> use cool; Database changed mysql> create table cool (id int,name varchar(10)); Query OK, 0 rows affected (0.15 sec) mysql> insert into cool values(1,'a'),(2,'b'),(3,'c'); Query OK, 3 rows affected (0.00 sec) Records: 3 Duplicates: 0 Warnings: 0 mysql> commit; Query OK, 0 rows affected (0.00 sec)
2. MHA组成
Manager节点:可以单独部署在一台独立的机器上管理多个master-slave集群,也可以部署在一台slave节点上。
Node节点:运行在每节点上。
1.2 MHA环境部署
1. 环境规划
主机名 |
IP |
节点信息 |
数据版本 |
系统版本 |
db1 |
192.168.174.100 |
mysql主节点 |
5.7.32 |
CentOS7.9 |
db2 |
192.168.174.101 |
mysql从节点 |
5.7.32 |
CentOS7.9 |
db3 |
192.168.174.102 |
mysql从节点 |
5.7.32 |
CentOS7.9 |
MHA |
192.168.174.150 |
MHA manager节点 |
0.58 |
CentOS7.9 |
2. 部署mysql主从复制(一主两从)
2.1 db1、db2、db3三台均安装mysql
(1)卸载mariadb
[root@db1 ~]# rpm -qa | grep mariadb //查看有无安装mariadb mariadb-libs-5.5.44-2.el7.centos.x86_64 [root@db1 ~]# rpm -e --nodeps mariadb-libs-5.5.44-2.el7.centos.x86_64 //卸载mariadb
(2)四台主机均关闭防火墙,修改所有节点主机名
[root@db1 ~]# systemctl stop firewalld [root@db1 ~]# systemctl disable firewalld Removed symlink /etc/systemd/system/dbus-org.fedoraproject.FirewallD1.service. Removed symlink /etc/systemd/system/basic.target.wants/firewalld.service. [root@db1 ~]# setenforce 0 [root@db1 ~]# sed -i 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/selinux/config
[root@db1 ~]# vi /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 192.168.174.100 db1 192.168.174.101 db2 192.168.174.102 db3 192.168.174.150 mha
(3)下载及解压
[root@db1 ~]# cd /opt/ [root@db1 opt]# wget https://cdn.mysql.com//Downloads/MySQL-5.7/mysql-5.7.32-linux-glibc2.12-x86_64.tar.gz --2021-06-17 12:57:37-- https://cdn.mysql.com//Downloads/MySQL-5.7/mysql-5.7.32-linux-glibc2.12-x86_64.tar.gz Resolving cdn.mysql.com (cdn.mysql.com)... 184.86.92.235 Connecting to cdn.mysql.com (cdn.mysql.com)|184.86.92.235|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 661214270 (631M) [application/x-tar-gz] Saving to: ‘mysql-5.7.32-linux-glibc2.12-x86_64.tar.gz’ 100%[========================>] 661,214,270 22.7MB/s in 54s 2021-06-17 12:58:35 (11.6 MB/s) - ‘mysql-5.7.32-linux-glibc2.12-x86_64.tar.gz’ saved [661214270/661214270]
[root@db1 opt]# tar zxvf mysql-5.7.32-linux-glibc2.12-x86_64.tar.gz
(4)安装和配置
安装依赖
[root@db1 opt]# yum -y install ncurses-devel autoconf
[root@db1 opt]# groupadd mysql [root@db1 opt]# useradd -g mysql -s /sbin/nologin -d /usr/local/mysql -MN mysql [root@db1 opt]# mkdir /data/mysql/data/ -p [root@db1 opt]# chown -R mysql:mysql /data/mysql [root@db1 opt]# mv mysql-5.7.32-linux-glibc2.12-x86_64 /usr/local/ [root@db1 opt]# ln -s /usr/local/mysql-5.7.32-linux-glibc2.12-x86_64 /usr/local/mysql [root@db1 opt]# chown -R mysql.mysql /usr/local/mysql [root@db1 opt]# cd /usr/local/mysql
[root@db1 mysql]# vi /etc/my.cnf [client] port = 3306 socket = /tmp/mysql.sock [mysql] no-auto-rehash [mysqld] user = mysql basedir = /usr/local/mysql datadir = /data/mysql/data port = 3306 socket = /tmp/mysql.sock log-error = error.log slow_query_log_file = slow.log character-set-server = utf8 open_files_limit = 65535 max_connections = 100 max_connect_errors = 100000 lower_case_table_names =1 server_id=1 gtid_mode=on enforce_gtid_consistency=on #binlog log_bin=master-binlog log-slave-updates=1 binlog_format=row #relay log skip-slave-start=1 relay_log_purge=0
#关闭定期清除。中继日志的清除是由SQL线程定期清除,若使用MHA则不让定期清除,因为mha做数据补齐时可能会用到中继日志。
其中两台从服务器的 server_id分别设为server_id=2、server_id=3
(5)初始化mysql
[root@db1 mysql]# /usr/local/mysql/bin/mysqld --initialize [root@db1 mysql]# grep 'temporary password' /data/mysql/data/error.log 2021-06-17T17:26:50.182859Z 1 [Note] A temporary password is generated for root@localhost: !AeVR%g&E9Li // 查看mysql密码
(6)复制启动脚本文件
[root@db1 mysql]# cp /usr/local/mysql/support-files/mysql.server /etc/init.d/mysql
(7)添加环境变量
[root@db1 mysql]# echo "export PATH=/usr/local/mysql/bin:$PATH">>/etc/profile [root@db1 mysql]# source /etc/profile
(8)启动mysql
[root@db1 mysql]# service mysql start
Starting MySQL. SUCCESS!
(9)查看你mysql进程
[root@db1 mysql]# ps -ef | grep mysql root 12636 1 0 13:35 pts/0 00:00:00 /bin/sh /usr/local/mysql/bin/mysqld_safe --datadir=/data/mysql/data --pid-file=/data/mysql/data/db1.pid mysql 12974 12636 0 13:35 pts/0 00:00:00 /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/data/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql --log-error=error.log --open-files-limit=65535 --pid-file=/data/mysql/data/db1.pid --socket=/tmp/mysql.sock --port=3306 root 13004 3036 0 13:37 pts/0 00:00:00 grep --color=auto mysql
(10)登录mysql
[root@db1 mysql]# mysql -uroot -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 3 Server version: 5.7.32-log Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
(11)修改mysql密码
mysql> alter user user() identified by 'test'; Query OK, 0 rows affected (0.00 sec)
(12)登录mysql
[root@db1 mysql]# mysql -uroot -ptest mysql: [Warning] Using a password on the command line interface can be insecure. Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.32-log MySQL Community Server (GPL) Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql>
(13)主服务器上创建复制账户
mysql> grant replication slave on *.* to 'repl'@'192.168.174.%' identified by '123456'; Query OK, 0 rows affected, 1 warning (0.00 sec)
(14)在主服务器上备份全库
[root@db1 mysql]# mysqldump -uroot -ptest --set-gtid-purged=OFF --master-data=2 --single-transaction -A > /opt/mysqlbak_`date +%Y%m%d`.sql mysqldump: [Warning] Using a password on the command line interface can be insecure.
导入mysql主备份的mysql到从服务器上
[root@db1 mysql]# cd /opt/ [root@db1 opt]# ls mysql-5.7.32-linux-glibc2.12-x86_64.tar.gz mysqlbak_20210617.sql [root@db1 opt]# scp /opt/mysqlbak_20210617.sql root@192.168.174.101:/opt/ The authenticity of host '192.168.174.101 (192.168.174.101)' can't be established. ECDSA key fingerprint is af:b5:02:13:22:e6:f3:8e:75:7c:45:c5:dc:28:47:8e. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.174.101' (ECDSA) to the list of known hosts. root@192.168.174.101's password: mysqlbak_20210617.sql 100% 852KB 852.1KB/s 00:00 [root@db1 opt]# scp /opt/mysqlbak_20210617.sql root@192.168.174.102:/opt/ The authenticity of host '192.168.174.102 (192.168.174.102)' can't be established. ECDSA key fingerprint is af:b5:02:13:22:e6:f3:8e:75:7c:45:c5:dc:28:47:8e. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.174.102' (ECDSA) to the list of known hosts. root@192.168.174.102's password: mysqlbak_20210617.sql 100% 852KB 852.1KB/s 00:00
[root@db2 ~]# mysql -uroot -ptest
mysql> source /opt/mysqlbak_20210617.sql;
[root@db3 ~]# mysql -uroot -ptest
mysql> source /opt/mysqlbak_20210617.sql;
配置从服务器上的连接
[root@db1 ~]# mysql -uroot -ptest mysql> CHANGE MASTER TO -> MASTER_HOST='192.168.174.100', -> MASTER_USER='repl', -> MASTER_PASSWORD='123456', -> MASTER_PORT=3306, -> MASTER_AUTO_POSITION=1, -> MASTER_CONNECT_RETRY=10; Query OK, 0 rows affected, 2 warnings (0.10 sec)
启动slave,如果Slave_IO_Running与Slave_SQL_Running都是yes就代表从服务器配置成功。
mysql> start slave; Query OK, 0 rows affected (0.00 sec) mysql> show slave status\G;
....... Slave_IO_Running: Yes Slave_SQL_Running: Yes
.......
3. 授权MHA管理账户及配置节点间免密登录
mysql> grant all privileges on *.* to mha@'192.168.174.%' identified by '123456'; mysql> flush privileges;
配置节点间免密登录
[root@db1 opt]# ssh-keygen #一直回车 [root@db1 opt]# cd /root/.ssh/ [root@db1 .ssh]# mv id_rsa.pub authorized_keys [root@db1 .ssh]# scp -r /root/.ssh 192.168.174.101:/root [root@db1 .ssh]# scp -r /root/.ssh 192.168.174.102:/root [root@db1 .ssh]# scp -r /root/.ssh 192.168.174.150:/root
登录各节点验证
db1:
ssh 192.168.137.101
ssh 192.168.137.102
ssh 192.168.137.150
db2:
ssh 192.168.137.100
ssh 192.168.137.102
ssh 192.168.137.150
db3:
ssh 192.168.137.100
ssh 192.168.137.101
ssh 192.168.137.150
mha:
ssh 192.168.137.100
ssh 192.168.137.101
ssh 192.168.137.102
在mysql服务节点(db1,db2,db3)安装node:
在安装前先安装依赖软件
[root@db1 ~]# yum -y install perl-DBD-MySQL
下载安装
[root@db1 ~]# wget https://github.com/yoshinorim/mha4mysql-node/releases/download/v0.58/mha4mysql-node-0.58-0.el7.centos.noarch.rpm
[root@db1 ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm Preparing... ################################# [100%] Updating / installing... 1:mha4mysql-node-0.58-0.el7.centos ################################# [100%]
创建命令软链接(mha调用mysql命令默认在/usr/bin下面,不做此步mha会报错)
[root@db1 ~]# ln -s /usr/local/mysql/bin/mysql /usr/bin/mysql
[root@db1 ~]# ln -s /usr/local/mysql/bin/mysqlbinlog /usr/local/bin/mysqlbinlog
4. 在管理节点(mha)上安装manage
安装必须软件
[root@mha ~]# yum -y install epel-release
[root@mha ~]# yum install -y perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager
安装mha4mysql-node,mha4mysql-manager依赖于mha2mysql-node
[root@mha ~]# rpm -ivh mha4mysql-node-0.58-0.el7.centos.noarch.rpm
安装mha4mysql-manager
[root@mha ~] wget https://github.com/yoshinorim/mha4mysql-manager/releases/download/v0.58/mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
[root@mha ~]# rpm -ivh mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
5. 配置MHA
全局配置文件
[root@mha ~]# mkdir -p /masterha/manager
[root@mha ~]# vi /etc/masterha_default.cnf
manager_workdir=/masterha/manager
remote_workdir=/tmp
对主从复制的配置文件app1
[root@mha ~]# mkdir -p /masterha/app1 [root@mha ~]# vi /masterha/app1.cnf [server default] manager_log=/masterha/app1/manager.log master_binlog_dir=/data/mysql/data user=mha password=123456 repl_user=repl repl_password=123456 ssh_user=root ping_interval=2 [server1] hostname=192.168.174.100 port=3306 [server2] hostname=192.168.174.101 port=3306 [server3] hostname=192.168.174.102 port=3306
在mha上测试ssh连通性
[root@mha ~]# masterha_check_ssh --conf=/masterha/app1.cnf
....
....
Thu Jun 17 18:45:39 2021 - [debug] ok.
Thu Jun 17 18:45:40 2021 - [info] All SSH connection tests passed successfully.
在mha上检查主从复制状态:
[root@mha ~]# masterha_check_repl --conf=/masterha/app1.cnf
....
MySQL Replication Health is OK.
6. 启动MHA
[root@mha ~]# nohup masterha_manager --conf=/masterha/app1.cnf > /masterha/app1/manager.log 2>&1 & [1] 12991
[root@mha ~]# tail -f /masterha/app1/manager.log 192.168.174.100(192.168.174.100:3306) (current master) +--192.168.174.101(192.168.174.101:3306) +--192.168.174.102(192.168.174.102:3306) Thu Jun 17 18:43:48 2021 - [warning] master_ip_failover_script is not defined. Thu Jun 17 18:43:48 2021 - [warning] shutdown_script is not defined. Thu Jun 17 18:43:48 2021 - [info] Set master ping interval 2 seconds. Thu Jun 17 18:43:48 2021 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes. Thu Jun 17 18:43:48 2021 - [info] Starting ping health check on 192.168.174.100(192.168.174.100:3306).. Thu Jun 17 18:43:48 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
7. MHA自动故障转移
指定自动转移节点,在app1.cnf添加如下参数(标红),当server1宕机,mha会根据candidate_master=1这个参数选择server2作为主节点,然后将server3挂到server2上进行重新架构,新主从。
[root@mha ~]# vi /masterha/app1.cnf [server default] manager_log=/masterha/app1/manager.log master_binlog_dir=/data/mysql/data user=mha password=123456 repl_user=repl repl_password=123456 ssh_user=root ping_interval=2 [server1] hostname=192.168.174.100 port=3306 [server2] hostname=192.168.174.101 port=3306 candidate_master=1 [server3] hostname=192.168.174.102
查看mha是否是启动状态
[root@mha ~]# masterha_check_status --conf=/masterha/app1.cnf app1 (pid:12991) is running(0:PING_OK), master:192.168.174.100
停止mha,重新启动mha,使新参数生效
[root@mha ~]# masterha_stop --global_conf=/etc/masterha_default.cnf --conf=/masterha/app1.cnf Stopped app1 successfully. [1]+ Exit 1 nohup masterha_manager --conf=/masterha/app1.cnf > /masterha/app1/manager.log 2>&1 [root@mha ~]# nohup masterha_manager --conf=/masterha/app1.cnf > /masterha/app1/manager.log 2>&1 & [1] 13324
查看server2、server3是否正常
mysql> show slave status\G;
......
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
......
模拟db1[mha(1)]关机,查看mha是否有故障转移到db2上,同时查看mha上的manager.log。发现主已经转移到db2上了。
[root@mha ~]# tail -f /masterha/app1/manager.log Master 192.168.174.100(192.168.174.100:3306) is down! Check MHA Manager logs at mha:/masterha/app1/manager.log for details. Started automated(non-interactive) failover. Selected 192.168.174.101(192.168.174.101:3306) as a new master. 192.168.174.101(192.168.174.101:3306): OK: Applying all logs succeeded. 192.168.174.102(192.168.174.102:3306): OK: Slave started, replicating from 192.168.174.101(192.168.174.101:3306) 192.168.174.101(192.168.174.101:3306): Resetting slave info succeeded. Master failover to 192.168.174.101(192.168.174.101:3306) completed successfully.
在db2上查看
mysql> show slave status\G; Empty set (0.00 sec) ERROR: No query specified
在db3上查看,Master_Host已变为192.168.137.101,说明主已转移成功。
mysql> show slave status\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 192.168.174.101 Master_User: repl Master_Port: 3306 Connect_Retry: 10 Master_Log_File: master-binlog.000002 Read_Master_Log_Pos: 853430 Relay_Log_File: db3-relay-bin.000002 Relay_Log_Pos: 852720 Relay_Master_Log_File: master-binlog.000002 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 853430 Relay_Log_Space: 852925 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 Master_UUID: 3b2dfcc2-cf91-11eb-a846-000c299de43d Master_Info_File: /data/mysql/data/master.info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 3b2dfcc2-cf91-11eb-a846-000c299de43d:1-125 Executed_Gtid_Set: 32be489b-cf91-11eb-9bd0-000c2943306a:1-4, 3b2dfcc2-cf91-11eb-a846-000c299de43d:1-125, 3d1a86c1-cf91-11eb-a768-000c2920da69:1-125 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.00 sec) ERROR: No query specified