PT-ONLINE-CHANGE-SCHEMA 无法连接从库

问题现状:

在执行ONLINE DDL时,PT工具报出异常,无法连接从库:
Cannot connect to A=utf8,D=test,P=3306,h=192.168.252.100,p=...,u=root
执行是完成了, 但这个警告忽略了从库,如果从库有异常,这个时候是不会管的。
 
分析:
环境ROOT密码不一致。
执行ONLINE DDL的用户为ROOT。
 
问题模拟:

1. 将主从库的ROOT密码设置为不一致:

mysql> set sql_log_bin=0;
Query OK, 0 rows affected (0.00 sec)

mysql> SET PASSWORD FOR 'root'@'localhost' = PASSWORD('xxxxx');
Query OK, 0 rows affected (0.01 sec)

mysql> grant all privileges on *.* to root@'localhost' identified by 'xxxxxx'; 
Query OK, 0 rows affected (0.00 sec)

mysql> commit;
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql> set sql_log_bin=1;
Query OK, 0 rows affected (0.00 sec)

mysql> 

2.  创建一张数据库,21855条测试数据:

mysql> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> create table test_row(id int(9) auto_increment primary key not null, name varchar(9));
Query OK, 0 rows affected (0.02 sec)
root@redis1:~# cat insert_row.sh 
#!/bin/bash
for i in `seq 1 21855`
do
	/usr/local/percona/bin/mysql -uroot -pxxxx -S /database/mysql/3306/run/mysql.sock -D test -e"insert into test_row(name) values('aa');"
done
exit 0

3. 用PT-ONLINE工具为此表增加一个AGE字段:

root@redis1:~# pt-online-schema-change --critical-load Threads_running=100 --user=root --host=localhost --ask-pass --port=3306 --charset=utf8 --nodrop-old-table --chunk-size=10000 --alter="add column age int(9)" D=test,t=test_row  --nocheck-replication-filters --exec
Enter MySQL password: 
Cannot connect to A=utf8,D=test,P=3306,h=192.168.252.100,p=...,u=root
No slaves found.  See --recursion-method if host redis1 has slaves.
Not checking slave lag because no slaves were found and --check-slave-lag was not specified.
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `test`.`test_row`...
Creating new table...
Created new table test._test_row_new OK.
Altering new table...
Altered `test`.`_test_row_new` OK.
2017-12-11T06:19:21 Creating triggers...
2017-12-11T06:19:21 Created triggers OK.
2017-12-11T06:19:21 Copying approximately 21787 rows...
Cannot connect to A=utf8,D=test,P=3306,h=192.168.252.100,p=...,u=root
2017-12-11T06:19:21 Copied rows OK.
2017-12-11T06:19:21 Analyzing new table...
2017-12-11T06:19:21 Swapping tables...
2017-12-11T06:19:21 Swapped original and new tables OK.
Not dropping old table because --no-drop-old-table was specified.
2017-12-11T06:19:21 Dropping triggers...
2017-12-11T06:19:21 Dropped triggers OK.
Successfully altered `test`.`test_row`.

4. 检查数据:

mysql> desc test_row;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    21866
Current database: test

+-------+------------+------+-----+---------+----------------+
| Field | Type       | Null | Key | Default | Extra          |
+-------+------------+------+-----+---------+----------------+
| id    | int(9)     | NO   | PRI | NULL    | auto_increment |
| name  | varchar(9) | YES  |     | NULL    |                |
| age   | int(9)     | YES  |     | NULL    |                |
+-------+------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> select count(*) from test_row;
+----------+
| count(*) |
+----------+
|    21855 |
+----------+
1 row in set (0.01 sec)
如真实情况一样, 字段已经添加上了,数据也正确。
解决问题的方案:
1. 将ROOT密码设置为一致。
经测试, 此方案依旧有警告信息,连接不上从库。
2. 设置一个新的账户,可以用来检测从库也可以执行DDL的用户。
2.1 授权账户:
mysql> GRANT RELOAD, PROCESS, SUPER, LOCK TABLES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'root'@'192.168.252.203' identified by  'xxxxx';
Query OK, 0 rows affected (0.00 sec)

mysql> GRANT ALL PRIVILEGES ON `test`.* TO 'root'@'192.168.252.203';
Query OK, 0 rows affected (0.00 sec)

mysql> flush privileges;
Query OK, 0 rows affected (0.00 sec)

mysql>
2.2 执行ONLINE DDL操作:
root@redis1:~# pt-online-schema-change --critical-load Threads_running=100 --user=root --host=192.168.252.203 --ask-pass --port=3306 --charset=utf8 --nodrop-old-table --chunk-size=10000 --alter="drop column age" D=test,t=test_row  --nocheck-replication-filters --exec
Enter MySQL password: 
Found 2 slaves:
  redis1
  redis2
Will check slave lag on:
  redis1
  redis2
Operation, tries, wait:
  analyze_table, 10, 1
  copy_rows, 10, 0.25
  create_triggers, 10, 1
  drop_triggers, 10, 1
  swap_tables, 10, 1
  update_foreign_keys, 10, 1
Altering `test`.`test_row`...
Creating new table...
Created new table test._test_row_new OK.
Waiting forever for new table `test`.`_test_row_new` to replicate to redis2...
Altering new table...
Altered `test`.`_test_row_new` OK.
2017-12-11T07:04:29 Creating triggers...
2017-12-11T07:04:29 Created triggers OK.
2017-12-11T07:04:29 Copying approximately 21787 rows...
2017-12-11T07:04:30 Copied rows OK.
2017-12-11T07:04:30 Analyzing new table...
2017-12-11T07:04:30 Swapping tables...
2017-12-11T07:04:30 Swapped original and new tables OK.
Not dropping old table because --no-drop-old-table was specified.
2017-12-11T07:04:30 Dropping triggers...
2017-12-11T07:04:30 Dropped triggers OK.
Successfully altered `test`.`test_row`.
root@redis1:~# 
从库被正常连接,并被正常检测 .
posted @ 2017-12-11 15:21 SMALL-D 阅读(...) 评论(...)  编辑 收藏