pt-table-checksum 3.0.4检测不出主从差异数据

群里好几位同学问 pt-table-checksum 3.0.4, 主从两个表数据是不一致,为啥检测不出来?前段时间自己也测试过,只是没整理成随笔^_-

一、基本环境

VMware10.0+CentOS6.9+MySQL5.7.19

ROLE HOSTNAME BASEDIR DATADIR IP PORT
Master ZST1 /usr/local/mysql /data/mysql/mysql3306/data 192.168.85.132 3306
Slave ZST2 /usr/local/mysql /data/mysql/mysql3306/data 192.168.85.133 3306

基于Row+Gtid搭建的一主一从复制结构:Master->Slave

二、构造差异数据

借助样例数据库sakila做测试

# 主库flush logs
mydba@192.168.85.132,3306 [sakila]> flush logs;

# 主库开启general_log
[root@ZST1 ~]# rm -rf /data/mysql/mysql3306/data/mysql-general.log
mydba@192.168.85.132,3306 [sakila]> set global general_log_file='/data/mysql/mysql3306/data/mysql-general.log';
mydba@192.168.85.132,3306 [sakila]> set global general_log =1;
mydba@192.168.85.132,3306 [sakila]> show variables like 'general_log%';

# 从库修改部分数据,造成不一致
mydba@192.168.85.133,3306 [sakila]> delete from sakila.actor where actor_id<=3; # 外键约束删除失败
mydba@192.168.85.133,3306 [sakila]> update sakila.actor set last_name=first_name where actor_id<=3;
# 主库sakila.actor数据
mydba@192.168.85.132,3306 [sakila]> select * from sakila.actor limit 3;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|        1 | PENELOPE   | GUINESS   | 2006-02-15 04:34:33 |
|        2 | NICK       | WAHLBERG  | 2006-02-15 04:34:33 |
|        3 | ED         | CHASE     | 2006-02-15 04:34:33 |
+----------+------------+-----------+---------------------+
# 从库sakila.actor数据
mydba@192.168.85.133,3306 [sakila]> select * from sakila.actor limit 3;
+----------+------------+-----------+---------------------+
| actor_id | first_name | last_name | last_update         |
+----------+------------+-----------+---------------------+
|        1 | PENELOPE   | PENELOPE  | 2017-11-08 09:54:10 |
|        2 | NICK       | NICK      | 2017-11-08 09:54:10 |
|        3 | ED         | ED        | 2017-11-08 09:54:10 |
+----------+------------+-----------+---------------------+
View Code

从库修改部分数据,造成主从不一致

三、pt-table-checksum

3.1、检测数据是否一致

pt-table-checksum可以在任何机器上执行,只要它能连接到Master就行。我是在从库执行,最后的参数指定到主库就行

# 运行pt-table-checksum
[root@ZST2 ~]# pt-table-checksum --nocheck-binlog-format --nocheck-replication-filters --recursion-method=hosts --replicate=sakila.checksums --databases=sakila --tables=actor,city --host=192.168.85.132 --port=3306 --user=mydba --password=mysql5719
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
11-08T09:57:01      0      0      200       1       0   0.075 sakila.actor
11-08T09:57:01      0      0      600       1       0   0.034 sakila.city
[root@ZST2 ~]# 
View Code

DIFFS=0表示没有差异数据。实际上主从数据不一致,这里却没有检测出来~
主库得到的general-log、binlog拷贝到其他文件夹,方便后续分析

# 拷贝general-log、binlog文件
[root@ZST1 ~]# cp /data/mysql/mysql3306/data/mysql-general.log /data/backup/mysql-general.log.ptchecksum3306
[root@ZST1 ~]# cp /data/mysql/mysql3306/logs/mysql-bin.000083 /data/backup/mysql-bin.000083.ptchecksum3306
View Code

3.2、查看general-log

[root@ZST1 ~]# cat /data/backup/mysql-general.log.ptchecksum3306
/usr/local/mysql/bin/mysqld, Version: 5.7.19-log (MySQL Community Server (GPL)). started with:
Tcp port: 3306  Unix socket: /tmp/mysql3306.sock
Time                 Id Command    Argument
2017-11-08T01:57:01.750917Z        20 Connect   mydba@192.168.85.133 on  using TCP/IP
2017-11-08T01:57:01.751564Z        20 Query     set autocommit=1
2017-11-08T01:57:01.752220Z        20 Query     SHOW VARIABLES LIKE 'innodb\_lock_wait_timeout'
2017-11-08T01:57:01.757028Z        20 Query     SET SESSION innodb_lock_wait_timeout=1
2017-11-08T01:57:01.757521Z        20 Query     SHOW VARIABLES LIKE 'wait\_timeout'
2017-11-08T01:57:01.760950Z        20 Query     SET SESSION wait_timeout=10000
2017-11-08T01:57:01.761400Z        20 Query     SELECT @@SQL_MODE
2017-11-08T01:57:01.761772Z        20 Query     SET @@SQL_QUOTE_SHOW_CREATE = 1/*!40101, @@SQL_MODE='NO_AUTO_VALUE_ON_ZERO,ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'*/
2017-11-08T01:57:01.762140Z        20 Query     SELECT @@server_id /*!50038 , @@hostname*/
2017-11-08T01:57:01.762475Z        20 Query     SELECT @@SQL_MODE
2017-11-08T01:57:01.762772Z        20 Query     SET SQL_MODE=',NO_AUTO_VALUE_ON_ZERO,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION'
2017-11-08T01:57:01.763098Z        20 Query     SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ
2017-11-08T01:57:01.763504Z        20 Query     SHOW VARIABLES LIKE 'wsrep_on'
2017-11-08T01:57:01.766949Z        20 Query     SELECT @@SERVER_ID
2017-11-08T01:57:01.767470Z        20 Query     SHOW SLAVE HOSTS
2017-11-08T01:57:01.787329Z        20 Query     SHOW VARIABLES LIKE 'wsrep_on'
2017-11-08T01:57:01.790712Z        20 Query     SELECT @@SERVER_ID
2017-11-08T01:57:01.794388Z        20 Query     SHOW VARIABLES LIKE 'wsrep_on'
2017-11-08T01:57:01.797637Z        20 Query     SELECT @@SERVER_ID
2017-11-08T01:57:01.801356Z        20 Query     SHOW DATABASES LIKE 'sakila'
2017-11-08T01:57:01.802164Z        20 Query     CREATE DATABASE IF NOT EXISTS `sakila` /* pt-table-checksum */
2017-11-08T01:57:01.802951Z        20 Query     USE `sakila`
2017-11-08T01:57:01.803300Z        20 Query     SHOW TABLES FROM `sakila` LIKE 'checksums'
2017-11-08T01:57:01.806111Z        20 Query     CREATE TABLE IF NOT EXISTS `sakila`.`checksums` (
     db             CHAR(64)     NOT NULL,
     tbl            CHAR(64)     NOT NULL,
     chunk          INT          NOT NULL,
     chunk_time     FLOAT            NULL,
     chunk_index    VARCHAR(200)     NULL,
     lower_boundary TEXT             NULL,
     upper_boundary TEXT             NULL,
     this_crc       CHAR(40)     NOT NULL,
     this_cnt       INT          NOT NULL,
     master_crc     CHAR(40)         NULL,
     master_cnt     INT              NULL,
     ts             TIMESTAMP    NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
     PRIMARY KEY (db, tbl, chunk),
     INDEX ts_db_tbl (ts, db, tbl)
  ) ENGINE=InnoDB DEFAULT CHARSET=utf8
2017-11-08T01:57:01.825908Z        20 Query     SHOW GLOBAL STATUS LIKE 'Threads_running'
2017-11-08T01:57:01.828793Z        20 Query     SELECT CONCAT(@@hostname, @@port)
2017-11-08T01:57:01.844001Z        20 Query     SELECT CRC32('test-string')
2017-11-08T01:57:01.844518Z        20 Query     SELECT CRC32('a')
2017-11-08T01:57:01.845025Z        20 Query     SELECT CRC32('a')
2017-11-08T01:57:01.845517Z        20 Query     SHOW VARIABLES LIKE 'wsrep_on'
2017-11-08T01:57:01.849157Z        20 Query     SHOW DATABASES
2017-11-08T01:57:01.850038Z        20 Query     SHOW /*!50002 FULL*/ TABLES FROM `sakila`
2017-11-08T01:57:01.851486Z        20 Query     /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */
2017-11-08T01:57:01.851943Z        20 Query     USE `sakila`
2017-11-08T01:57:01.852408Z        20 Query     SHOW CREATE TABLE `sakila`.`actor`
2017-11-08T01:57:01.853034Z        20 Query     /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */
2017-11-08T01:57:01.854092Z        20 Query     EXPLAIN SELECT * FROM `sakila`.`actor` WHERE 1=1
2017-11-08T01:57:01.857374Z        20 Query     USE `sakila`
2017-11-08T01:57:01.857990Z        20 Query     DELETE FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor'
2017-11-08T01:57:01.877626Z        20 Query     USE `sakila`
2017-11-08T01:57:01.878413Z        20 Query     EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`, convert(`first_name` using utf8mb4), convert(`last_name` using utf8mb4), UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`actor` /*explain checksum table*/
2017-11-08T01:57:01.879347Z        20 Query     REPLACE INTO `sakila`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sakila', 'actor', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `actor_id`, convert(`first_name` using utf8mb4), convert(`last_name` using utf8mb4), UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`actor` /*checksum table*/
2017-11-08T01:57:01.881166Z        20 Query     SHOW WARNINGS
2017-11-08T01:57:01.881764Z        20 Query     SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1'
2017-11-08T01:57:01.897051Z        20 Query     UPDATE `sakila`.`checksums` SET chunk_time = '0.001821', master_crc = '6816983c', master_cnt = '200' WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1'
2017-11-08T01:57:01.900914Z        20 Query     SHOW GLOBAL STATUS LIKE 'Threads_running'
2017-11-08T01:57:01.930534Z        20 Query     /*!40101 SET @OLD_SQL_MODE := @@SQL_MODE, @@SQL_MODE := '', @OLD_QUOTE := @@SQL_QUOTE_SHOW_CREATE, @@SQL_QUOTE_SHOW_CREATE := 1 */
2017-11-08T01:57:01.931387Z        20 Query     USE `sakila`
2017-11-08T01:57:01.932194Z        20 Query     SHOW CREATE TABLE `sakila`.`city`
2017-11-08T01:57:01.933399Z        20 Query     /*!40101 SET @@SQL_MODE := @OLD_SQL_MODE, @@SQL_QUOTE_SHOW_CREATE := @OLD_QUOTE */
2017-11-08T01:57:01.935136Z        20 Query     EXPLAIN SELECT * FROM `sakila`.`city` WHERE 1=1
2017-11-08T01:57:01.940169Z        20 Query     USE `sakila`
2017-11-08T01:57:01.941026Z        20 Query     DELETE FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'city'
2017-11-08T01:57:01.942010Z        20 Query     USE `sakila`
2017-11-08T01:57:01.943012Z        20 Query     EXPLAIN SELECT COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `city_id`, convert(`city` using utf8mb4), `country_id`, UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`city` /*explain checksum table*/
2017-11-08T01:57:01.945033Z        20 Query     REPLACE INTO `sakila`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'sakila', 'city', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `city_id`, convert(`city` using utf8mb4), `country_id`, UNIX_TIMESTAMP(`last_update`))) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `sakila`.`city` /*checksum table*/
2017-11-08T01:57:01.960088Z        20 Query     SHOW WARNINGS
2017-11-08T01:57:01.960938Z        20 Query     SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'city' AND chunk = '1'
2017-11-08T01:57:01.961674Z        20 Query     UPDATE `sakila`.`checksums` SET chunk_time = '0.015889', master_crc = '4d700c4', master_cnt = '600' WHERE db = 'sakila' AND tbl = 'city' AND chunk = '1'
2017-11-08T01:57:01.964712Z        20 Query     SHOW GLOBAL STATUS LIKE 'Threads_running'
2017-11-08T01:57:01.972503Z        20 Quit
[root@ZST1 ~]# 
View Code

general-log逻辑
• 设置SESSION选项
• 创建checksums数据表
• 针对每一张需要check的表执行下面操作
  • DELETE:从checksums表中删除sakila的记录
  • EXPLAIN:分析计算sakila的this_cnt,this_crc的执行计划
  • REPLACE INTO:计算sakila的this_cnt,this_crc
  • UPDATE:使用this_cnt,this_crc更新master_crc,master_cnt
在主库上这些以SQL语句的形式执行,且执行时没有设置SESSION的日志格式为STATEMENT,主库的binlog_format='ROW',所以binlog里记录的是语句的最终执行结果(具体的数值,而非SQL语句)

3.3、查看binlog

[root@ZST1 ~]# mysqlbinlog -v --base64-output=decode-rows /data/backup/mysql-bin.000083.ptchecksum3306
View Code

binlog逻辑是:首先创建checksums数据表,然后delete->insert->update checksums  具体数值
主库上最后一个update语句

SELECT this_crc, this_cnt FROM `sakila`.`checksums` WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1'
UPDATE `sakila`.`checksums` SET chunk_time = '0.001821', master_crc = '6816983c', master_cnt = '200' WHERE db = 'sakila' AND tbl = 'actor' AND chunk = '1'
View Code

在binlog体现为(原封不动应用到从库)

[root@ZST1 ~]# mysqlbinlog -v --base64-output=decode-rows /data/backup/mysql-bin.000083.ptchecksum3306
...
COMMIT/*!*/;
# at 1604
#171108  9:57:01 server id 1323306  end_log_pos 1669 CRC32 0x16bf0702   GTID    last_committed=3        sequence_number=4       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= '8ab82362-9c37-11e7-a858-000c29c1025c:575'/*!*/;
# at 1669
#171108  9:57:01 server id 1323306  end_log_pos 1743 CRC32 0xee6b7639   Query   thread_id=20    exec_time=0     error_code=0
SET TIMESTAMP=1510106221/*!*/;
BEGIN
/*!*/;
# at 1743
#171108  9:57:01 server id 1323306  end_log_pos 1823 CRC32 0x589cc01f   Table_map: `sakila`.`checksums` mapped to number 248
# at 1823
#171108  9:57:01 server id 1323306  end_log_pos 1950 CRC32 0xc0604f63   Update_rows: table id 248 flags: STMT_END_F
### UPDATE `sakila`.`checksums`
### WHERE
###   @1='sakila'
###   @2='actor'
###   @3=1
###   @4=NULL
###   @5=NULL
###   @6=NULL
###   @7=NULL
###   @8='6816983c'
###   @9=200
###   @10=NULL
###   @11=NULL
###   @12=1510106221
### SET
###   @1='sakila'
###   @2='actor'
###   @3=1
###   @4=0.001821            
###   @5=NULL
###   @6=NULL
###   @7=NULL
###   @8='6816983c'
###   @9=200
###   @10='6816983c'
###   @11=200
###   @12=1510106221
# at 1950
#171108  9:57:01 server id 1323306  end_log_pos 1981 CRC32 0x1f197fe6   Xid = 198
COMMIT/*!*/;
# at 1981
View Code

也就是说从库不会去计算所谓的CRC32,它直接完整copy主库的checksums的所有内容

3.4、如何解决

个人认为只有在statement格式下才能进行,因为两边要计算CRC32,计算完后再把主上的master_crc、master_cnt更新到从库,最后在从库对比master和this相关列。pt-table-checksum 3.0.4在执行时缺少SET @@binlog_format='STATEMENT',建议不要使用。
有一种很挫的方法,仅仅是为了看差异结果(生产环境勿用),执行pt-table-checksum前,在主上 set global binlog_format='STATEMENT';

# 主库修改binlog_format为statement
mydba@192.168.85.132,3306 [sakila]> set global binlog_format='STATEMENT';

# 从库运行pt-table-checksum
[root@ZST2 ~]# pt-table-checksum --nocheck-binlog-format --nocheck-replication-filters --recursion-method=hosts --replicate=sakila.checksums --databases=sakila --tables=actor,city --host=192.168.85.132 --port=3306 --user=mydba --password=mysql5719
            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE
11-08T12:40:27      0      1      200       1       0   0.015 sakila.actor
11-08T12:40:27      0      0      600       1       0   0.024 sakila.city
[root@ZST2 ~]# 
View Code

DIFFS=1,说明sakila.actor表存在差异

# 差异信息
mydba@192.168.85.133,3306 [sakila]> SELECT db,tbl,SUM(this_cnt) AS total_rows,COUNT(*) AS chunks
FROM sakila.checksums 
WHERE (master_cnt <> this_cnt OR master_crc <> this_crc OR ISNULL(master_crc) <> ISNULL(this_crc))
GROUP BY db,tbl;
+--------+-------+------------+--------+
| db     | tbl   | total_rows | chunks |
+--------+-------+------------+--------+
| sakila | actor |        200 |      1 |
+--------+-------+------------+--------+
1 row in set (0.00 sec)
View Code

主要就是查看master_cnt、this_cnt和master_crc、this_crc

四、pt-table-sync

4.1、修复数据不一致

前面已经检测出主从数据不一致,下面使用pt-table-sync修复数据

# 打印命令
[root@ZST2 ~]# pt-table-sync --replicate=sakila.checksums --sync-to-master h=192.168.85.133,u=mydba,p=mysql5719,P=3306 --databases=sakila --charset=utf8 --print
REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('1', 'PENELOPE', 'GUINESS', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/;
REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('2', 'NICK', 'WAHLBERG', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/;
REPLACE INTO `sakila`.`actor`(`actor_id`, `first_name`, `last_name`, `last_update`) VALUES ('3', 'ED', 'CHASE', '2006-02-15 04:34:33') /*percona-toolkit src_db:sakila src_tbl:actor src_dsn:A=utf8,P=3306,h=192.168.85.132,p=...,u=mydba dst_db:sakila dst_tbl:actor dst_dsn:A=utf8,P=3306,h=192.168.85.133,p=...,u=mydba lock:1 transaction:1 changing_src:sakila.checksums replicate:sakila.checksums bidirectional:0 pid:3365 user:uest host:ZST2*/;
[root@ZST2 ~]# 

# 执行命令
[root@ZST2 ~]# pt-table-sync --replicate=sakila.checksums --sync-to-master h=192.168.85.133,u=mydba,p=mysql5719,P=3306 --databases=sakila --charset=utf8 --execute
REPLACE statements on sakila.actor can adversely affect child table `sakila`.`film_actor` because it has an ON UPDATE CASCADE foreign key constraint. See --[no]check-child-tables in the documentation for more information. --check-child-tables error  while doing sakila.actor on 192.168.85.133
[root@ZST2 ~]# 
View Code

--execute就是执行打印出来的命令,REPLACE INTO实际对应delete、insert操作,由于外键约束delete失败(构造差异数据时就尝试过delete),修复不成功。
pt-table-checksum及pt-table-sync详细说明请参考:pt-table-checksum解读使用pt-table-checksum及pt-table-sync校验复制一致性

posted @ 2017-11-08 15:04  Uest  阅读(1361)  评论(0编辑  收藏  举报