跳过复制错误——sql_slave_skip_counter

昨天不少同学讨论《小心,前方有雷 —— sql_slave_skip_counter》,有说作者在玩文字游戏,扯了那么多sql_slave_skip_counter=1不还是跳过一个事务嘛。自己看了几遍原文,好像是那么回事,但又没明白slave_exec_mode参数如何影响。只能说一百个读者有一百种见解,甚至随着读者的切入点、知识的变化而改变。
计划用两篇文章写写跳过复制错误相关的三个参数sql_slave_skip_counter、slave_skip_errors、slave_exec_mode

一、基本环境

VMware10.0+CentOS6.9+MySQL5.7.19

ROLE HOSTNAME BASEDIR DATADIR IP PORT
Master ZST1 /usr/local/mysql /data/mysql/mysql3306/data 192.168.85.132 3306
Slave ZST2 /usr/local/mysql /data/mysql/mysql3306/data 192.168.85.133 3306

基于Row+Position搭建的一主一从异步复制结构:Master->{Slave}

二、sql_slave_skip_counter官方解释

https://dev.mysql.com/doc/refman/5.7/en/set-global-sql-slave-skip-counter.html
SET GLOBAL sql_slave_skip_counter = N
This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.
When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.
• For transactional tables, an event group corresponds to a transaction.
• For nontransactional tables, an event group corresponds to a single SQL statement.
When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group.

三、测试案例

从官方解释我们知道, sql_slave_skip_counter以event为单位skip,直到skip完第N个event所在的event group才停止。对于事务表,一个event group对应一个事务;对于非事务表,一个event group对应一条SQL语句。一个event group包含多个events。
这里我只针对显式事务模拟insert遇到Duplicate entry(1062错误),知道了问题本质,delete/update中的1032错误类似去分析

3.1、测试数据

主库创建一个事务表和一个非事务表,然后从库往各表写入id=1的记录

# 主库创建测试表
mydba@192.168.85.132,3306 [replcrash]> create table repl_innodb(id int primary key,name1 char(10),name2 char(10)) engine=innodb;
mydba@192.168.85.132,3306 [replcrash]> create table repl_myisam(id int primary key,name1 char(10),name2 char(10)) engine=myisam;

# 从库往测试表中添加数据,不记入binlog
mydba@192.168.85.133,3306 [replcrash]> set sql_log_bin=0;
mydba@192.168.85.133,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'s1062-1','s1062-1');
mydba@192.168.85.133,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'s1062-1','s1062-1');
mydba@192.168.85.133,3306 [replcrash]> set sql_log_bin=1;
View Code

3.2、transactional tables

主库往事务表中添加数据

# 主库往事务表中添加数据
mydba@192.168.85.132,3306 [replcrash]> begin;
mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'m1062-1','m1062-1');
mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(2,'m1062-2','m1062-2');
mydba@192.168.85.132,3306 [replcrash]> commit;
mydba@192.168.85.132,3306 [replcrash]> select * from repl_innodb;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | m1062-1 | m1062-1 |
|  2 | m1062-2 | m1062-2 |
+----+---------+---------+
View Code

很明显,从库先写入数据占用id=1,主库再写入数据,复制将主库id=1的写入记录传递到从库,造成从库key冲突(1062错误)
我们尝试使用sql_slave_skip_counter跳过错误(实际遇到1062写入key冲突,我们应该根据 Duplicate entry 删除从库对应记录)

# 从库跳过“1个”错误,并启动sql_thread
mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1;
mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread;
mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | s1062-1 | s1062-1 |
+----+---------+---------+
View Code

从库不仅跳过了id=1的记录,还跳过了id=2的记录
分析:主库上的begin..commit之间对事务表的操作记录为一个事务,对应一个event group。id=1应用于从库遇到Duplicate entry错误,我们使用sql_slave_skip_counter跳过这个event之后,还在此group内,需要继续跳过此group中的后续events。因此在从库不会有id=2的记录~

[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=1869
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 1869
#171201 10:15:11 server id 1323306  end_log_pos 1934 CRC32 0x3a86cd44   Anonymous_GTID  last_committed=5        sequence_number=6       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 1934
#171201 10:14:43 server id 1323306  end_log_pos 2011 CRC32 0x83c239df   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512094483/*!*/;
SET @@session.pseudo_thread_id=4/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
==================== repl_innodb表写入id=1、2的记录,在一个事务中Start ====================
BEGIN
/*!*/;
# at 2011
#171201 10:14:43 server id 1323306  end_log_pos 2076 CRC32 0x0f3612fe   Table_map: `replcrash`.`repl_innodb` mapped to number 263
# at 2076
#171201 10:14:43 server id 1323306  end_log_pos 2132 CRC32 0x01de5dbd   Write_rows: table id 263 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_innodb`
### SET
###   @1=1
###   @2='m1062-1'
###   @3='m1062-1'
# at 2132
#171201 10:14:50 server id 1323306  end_log_pos 2197 CRC32 0xf838b054   Table_map: `replcrash`.`repl_innodb` mapped to number 263
# at 2197
#171201 10:14:50 server id 1323306  end_log_pos 2253 CRC32 0xbd9ae02a   Write_rows: table id 263 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_innodb`
### SET
###   @1=2
###   @2='m1062-2'
###   @3='m1062-2'
# at 2253
#171201 10:15:11 server id 1323306  end_log_pos 2284 CRC32 0x0292df6a   Xid = 60
COMMIT/*!*/;
==================== repl_innodb表写入id=1、2的记录,在一个事务中End ====================
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST1 logs]#
View Binlog

3.3、nontransactional tables

主库往非事务表中添加数据

# 主库往非事务表中添加数据
mydba@192.168.85.132,3306 [replcrash]> begin;
mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1');
mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(2,'m1062-2','m1062-2');
mydba@192.168.85.132,3306 [replcrash]> commit;
mydba@192.168.85.132,3306 [replcrash]> select * from repl_myisam;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | m1062-1 | m1062-1 |
|  2 | m1062-2 | m1062-2 |
+----+---------+---------+
View Code

同理,从库先写入数据占用id=1,主库再写入数据,复制将主库id=1的写入记录传递到从库,造成从库key冲突(1062错误)
我们尝试使用sql_slave_skip_counter跳过错误(实际遇到1062写入key冲突,我们应该根据 Duplicate entry 删除从库对应记录)

# 从库跳过“1个”错误,并启动sql_thread
mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1;
mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread;
mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | s1062-1 | s1062-1 |
|  2 | m1062-2 | m1062-2 |
+----+---------+---------+
View Code

从库跳过了id=1的记录,但复制了id=2的记录
分析:主库上的begin..commit之间对非事务表的操作记录为多个事务,每一条SQL语句对应一个event group。id=1应用于从库遇到Duplicate entry错误,我们使用sql_slave_skip_counter跳过这个event之后,已经到了此group的末尾。SQL thread直接从下一个event group开始,这里就是repl_myisam.id=2的那条语句。因此在从库会有id=2的记录~
实际它在执行第一条insert语句后,从库就报1062错误;前面的transactional tables需要在事务commit后从库才报错

[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=2284
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 2284
#171201 10:30:31 server id 1323306  end_log_pos 2349 CRC32 0x5d208979   Anonymous_GTID  last_committed=6        sequence_number=7       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 2349
#171201 10:30:31 server id 1323306  end_log_pos 2426 CRC32 0xe4ce4da8   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512095431/*!*/;
SET @@session.pseudo_thread_id=4/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
==================== repl_myisam表写入id=1的记录Start ====================
BEGIN
/*!*/;
# at 2426
#171201 10:30:31 server id 1323306  end_log_pos 2491 CRC32 0x76a45e15   Table_map: `replcrash`.`repl_myisam` mapped to number 261
# at 2491
#171201 10:30:31 server id 1323306  end_log_pos 2547 CRC32 0xd187097a   Write_rows: table id 261 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_myisam`
### SET
###   @1=1
###   @2='m1062-1'
###   @3='m1062-1'
# at 2547
#171201 10:30:31 server id 1323306  end_log_pos 2625 CRC32 0xc8210551   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512095431/*!*/;
COMMIT
/*!*/;
# at 2625
==================== repl_myisam表写入id=1的记录End ====================
#171201 10:30:44 server id 1323306  end_log_pos 2690 CRC32 0x22b268fd   Anonymous_GTID  last_committed=7        sequence_number=8       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 2690
#171201 10:30:44 server id 1323306  end_log_pos 2767 CRC32 0x43061ce5   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512095444/*!*/;
==================== repl_myisam表写入id=2的记录Start ====================
BEGIN
/*!*/;
# at 2767
#171201 10:30:44 server id 1323306  end_log_pos 2832 CRC32 0xe1c084b9   Table_map: `replcrash`.`repl_myisam` mapped to number 261
# at 2832
#171201 10:30:44 server id 1323306  end_log_pos 2888 CRC32 0x56bacb73   Write_rows: table id 261 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_myisam`
### SET
###   @1=2
###   @2='m1062-2'
###   @3='m1062-2'
# at 2888
#171201 10:30:44 server id 1323306  end_log_pos 2966 CRC32 0x6527c3b6   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512095444/*!*/;
COMMIT
/*!*/;
==================== repl_myisam表写入id=2的记录End ====================
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST1 logs]# 
View Binlog

3.4、一个事务中包含事务表和非事务表操作

为了方便,我将表中数据置为初始状态,主库两表为空,从库两表各有id=1的记录
主库往事务表和非事务表中添加数据

# 主库往事务表、非事务表中添加数据
mydba@192.168.85.132,3306 [replcrash]> begin;
mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(1,'m1062-1','m1062-1');
mydba@192.168.85.132,3306 [replcrash]> insert into repl_innodb(id,name1,name2) values(2,'m1062-2','m1062-2');
mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1');
mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(2,'m1062-2','m1062-2');
mydba@192.168.85.132,3306 [replcrash]> commit;
mydba@192.168.85.132,3306 [replcrash]> select * from repl_innodb;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | m1062-1 | m1062-1 |
|  2 | m1062-2 | m1062-2 |
+----+---------+---------+
mydba@192.168.85.132,3306 [replcrash]> select * from repl_myisam;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | m1062-1 | m1062-1 |
|  2 | m1062-2 | m1062-2 |
+----+---------+---------+
View Code

根据前面的分析,我们知道从库上的repl_innodb、repl_myisam表都存在key冲突(1062错误)
我们尝试使用sql_slave_skip_counter跳过错误(实际遇到1062写入key冲突,我们应该根据 Duplicate entry 删除从库对应记录)

# 从库跳过“1个”错误,并启动sql_thread
mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1;
mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread;
mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | s1062-1 | s1062-1 |
+----+---------+---------+
mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | s1062-1 | s1062-1 |
|  2 | m1062-2 | m1062-2 |
+----+---------+---------+
View Code

从库repl_innodb表暂时没有操作;从库repl_myisam表跳过id=1的记录,复制了id=2的记录
注意:此时跳过的是replcrash.repl_myisam上的Duplicate entry错误,对于非事务表一条SQL语句对应一个event group,SQL thread直接从下一个event group开始,这里就是repl_myisam.id=2的那条语句。因此从库repl_myisam表会有id=2的记录~
紧接着复制又会报key冲突(1062错误),因为还有repl_innodb.id=1这个key,我们继续跳过

# 从库跳过“1个”错误,并启动sql_thread
mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=1;
mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread;
mydba@192.168.85.133,3306 [replcrash]> select * from repl_innodb;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | s1062-1 | s1062-1 |
+----+---------+---------+
mydba@192.168.85.133,3306 [replcrash]> select * from repl_myisam;
+----+---------+---------+
| id | name1   | name2   |
+----+---------+---------+
|  1 | s1062-1 | s1062-1 |
|  2 | m1062-2 | m1062-2 |
+----+---------+---------+
View Code

从库repl_innodb表跳过id=1的记录,还跳过了id=2的记录;从库repl_myisam表暂时没有操作
主库上执行的语句明明是先insert repl_innodb,再insert repl_myisam,为什么sql_slave_skip_counter是先跳过repl_myisam表上的错误,再跳过repl_innodb上的错误?
这就要从事务表和非事务表的区别去分析,主库显式的在一个事务中操作事务表+非事务表,实际上所有对事务表的操作是在同一个显式事务中;所有对非事务表的操作,每条SQL语句单独对应一个事务。因此主库上的操作可理解成下面操作:
开启显式事务1,往repl_innodb表写入id=1、2两条记录-->开启事务2,往repl_myisam表写入id=1记录,提交事务2-->开启事务3,往repl_myisam表写入id=2记录,提交事务3-->提交显式事务1
当事务2提交后,从库报repl_myisam上的Duplicate entry错误;我们跳过这个错误,当事务3提交后,从库写入repl_myisam.id=2的记录;当事务1提交后,从库报repl_innodb上的Duplicate entry错误;我们再跳过这个错误,复制就正常了~
我们看下对应的binlog

[root@ZST1 logs]# mysqlbinlog -v --base64-output=decode-rows mysql-bin.000125 --start-position=2966
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 2966
#171201 10:54:02 server id 1323306  end_log_pos 3031 CRC32 0x9a009a72   Anonymous_GTID  last_committed=8        sequence_number=9       rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 3031
#171201 10:54:02 server id 1323306  end_log_pos 3108 CRC32 0x9738837a   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512096842/*!*/;
SET @@session.pseudo_thread_id=4/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1436549152/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
==================== repl_myisam表写入id=1的记录Start ====================
BEGIN
/*!*/;
# at 3108
#171201 10:54:02 server id 1323306  end_log_pos 3173 CRC32 0x8c4283c5   Table_map: `replcrash`.`repl_myisam` mapped to number 265
# at 3173
#171201 10:54:02 server id 1323306  end_log_pos 3229 CRC32 0xd8953aae   Write_rows: table id 265 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_myisam`
### SET
###   @1=1
###   @2='m1062-1'
###   @3='m1062-1'
# at 3229
#171201 10:54:02 server id 1323306  end_log_pos 3307 CRC32 0x218fdb23   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512096842/*!*/;
COMMIT
/*!*/;
# at 3307
==================== repl_myisam表写入id=1的记录End ====================
#171201 10:54:36 server id 1323306  end_log_pos 3372 CRC32 0x0ba119ad   Anonymous_GTID  last_committed=9        sequence_number=10      rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 3372
#171201 10:54:36 server id 1323306  end_log_pos 3449 CRC32 0x9bcdeee5   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512096876/*!*/;
==================== repl_myisam表写入id=2的记录Start ====================
BEGIN
/*!*/;
# at 3449
#171201 10:54:36 server id 1323306  end_log_pos 3514 CRC32 0xd52491e6   Table_map: `replcrash`.`repl_myisam` mapped to number 265
# at 3514
#171201 10:54:36 server id 1323306  end_log_pos 3570 CRC32 0x23bcd75d   Write_rows: table id 265 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_myisam`
### SET
###   @1=2
###   @2='m1062-2'
###   @3='m1062-2'
# at 3570
#171201 10:54:36 server id 1323306  end_log_pos 3648 CRC32 0x3ba9a1a1   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512096876/*!*/;
COMMIT
/*!*/;
# at 3648
==================== repl_myisam表写入id=2的记录End ====================
#171201 10:54:41 server id 1323306  end_log_pos 3713 CRC32 0x122cdb79   Anonymous_GTID  last_committed=10       sequence_number=11      rbr_only=yes
/*!50718 SET TRANSACTION ISOLATION LEVEL READ COMMITTED*//*!*/;
SET @@SESSION.GTID_NEXT= 'ANONYMOUS'/*!*/;
# at 3713
#171201 10:53:22 server id 1323306  end_log_pos 3790 CRC32 0x68d45d7b   Query   thread_id=4     exec_time=0     error_code=0
SET TIMESTAMP=1512096802/*!*/;
==================== repl_innodb表写入id=1、2的记录,在一个事务中Start ====================
BEGIN
/*!*/;
# at 3790
#171201 10:53:22 server id 1323306  end_log_pos 3855 CRC32 0xf4359a8d   Table_map: `replcrash`.`repl_innodb` mapped to number 264
# at 3855
#171201 10:53:22 server id 1323306  end_log_pos 3911 CRC32 0x9975aac8   Write_rows: table id 264 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_innodb`
### SET
###   @1=1
###   @2='m1062-1'
###   @3='m1062-1'
# at 3911
#171201 10:53:30 server id 1323306  end_log_pos 3976 CRC32 0xc5ac7f71   Table_map: `replcrash`.`repl_innodb` mapped to number 264
# at 3976
#171201 10:53:30 server id 1323306  end_log_pos 4032 CRC32 0x1ad72c78   Write_rows: table id 264 flags: STMT_END_F
### INSERT INTO `replcrash`.`repl_innodb`
### SET
###   @1=2
###   @2='m1062-2'
###   @3='m1062-2'
# at 4032
#171201 10:54:41 server id 1323306  end_log_pos 4063 CRC32 0x4f265b37   Xid = 83
COMMIT/*!*/;
==================== repl_innodb表写入id=1、2的记录,在一个事务中End ====================
SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/;
DELIMITER ;
# End of log file
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
[root@ZST1 logs]# 
View Binlog

可以看到binlog和我们的分析一致,也从侧面说明binlog是按事务提交顺序写入的(Redo按事务发生顺序写入)

3.5、N应该设多大

# 主库往非事务表中添加数据
mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1');

我们习惯性在skip错误的时候,将N设成1。潜移默化地认为1就是跳过一个错误,管它一个event、一个event group、一条SQL语句、一个事务,反正它就是跳过去了。
根据官方解释,这条对非事务表的insert操作语句,对应一个event group,它里面实际有多个events,而不是只有一个event!
使用set global sql_slave_skip_counter=1,跳过一个event,由于它还在event group中,它会继续跳过此group中的后续events!如果只看表象的话,还真以为它在binlog中只有一个event(⊙_⊙)
可以用下面方法验证,它不只对应一个event

# 从库设置sql_slave_skip_counter
mydba@192.168.85.133,3306 [replcrash]> stop slave sql_thread;
mydba@192.168.85.133,3306 [replcrash]> set global sql_slave_skip_counter=100;
mydba@192.168.85.133,3306 [replcrash]> start slave sql_thread;

# 主库往非事务表中添加数据(主库空表,从库存在id=1的记录)
mydba@192.168.85.132,3306 [replcrash]> insert into repl_myisam(id,name1,name2) values(1,'m1062-1','m1062-1');

# 从库查看Skip_Counter计数
mydba@192.168.85.133,3306 [replcrash]> pager grep Skip_Counter;
mydba@192.168.85.133,3306 [replcrash]> show slave status\G
                 Skip_Counter: 95
View Code

在从库设置跳过100个events,然后在主库执行这条语句,再到从库查看show slave status\G返回的Skip_Counter列,你会发现它并不是由100变成99,至于一条语句到底对应多少个events,得自行脑补●-●
不要以为在3.4中set global sql_slave_skip_counter=3;就能跳过repl_myisam.id=1、repl_myisam.id=2、repl_innodb.id=1对应的三条SQL语句

四、总结

写了那么多,感觉也在玩文字游戏。 sql_slave_skip_counter以event为单位skip,直到skip完第N个event所在的event group才停止。对于事务表,一个event group对应一个事务;对于非事务表,一个event group对应一条SQL语句。一个event group包含多个events。
delete在从库找不到对应行,sql_slave_skip_counter可能省事,但极有可能跳过其他events,导致主从数据不一致。对于1032、1062错误尽量修补数据,让复制进程在从库应用变更。

posted @ 2017-12-01 17:30  Uest  阅读(13269)  评论(5编辑  收藏  举报