主库断电重启-主从中断处理案例

故障场景:昨天自建机房突然断电,检查一下数据库状态以及主从复制状态


(crmDB)root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: 
                  Master_Host: 172.16.117.247
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000262
          Read_Master_Log_Pos: 342439502
               Relay_Log_File: relay-bin.000042
                Relay_Log_Pos: 342439665
        Relay_Master_Log_File: mysql-bin.000262
             Slave_IO_Running: No
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 342439502
              Relay_Log_Space: 342439879
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 1236
                Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from position > file size; the first event 'mysql-bin.000262' at 342439502, the last event read from '/data/data/mysql-bin.000262' at 4, the last byte read from '/data/data/mysql-bin.000262' at 4.'
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 247
                  Master_UUID: db5c1497-9cf4-11e3-8259-77a6dd604747
             Master_Info_File: /data/3306/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 150717 09:59:39
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

(crmDB)root@localhost [(none)]> 
根据报错信息,看看主库binlog发生了啥

 mysqlbinlog --start-position=342439502 /data/data/mysql-bin.000262 > /tmp/binlog.tx
[root@crmdbM data]# cat /tmp/binlog.txt 
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150716  4:10:02 server id 247  end_log_pos 120 CRC32 0x0b64a00d     Start: binlog v 4, server v 5.6.14-log created 150716  4:10:02
BINLOG '
Gr6mVQ/3AAAAdAAAAHgAAAAAAAQANS42LjE0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAQ2g
ZAs=
'/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;

################################ 
没有看到这个位置,这是mysql-bin.000262最开头的位置,

[root@crmdbM data]# mysqlbinlog  /data/data/mysql-bin.000262 | head -n 10
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150716  4:10:02 server id 247  end_log_pos 120 CRC32 0x0b64a00d     Start: binlog v 4, server v 5.6.14-log created 150716  4:10:02
BINLOG '
Gr6mVQ/3AAAAdAAAAHgAAAAAAAQANS42LjE0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAQ2g
ZAs=
'/*!*/;
 
既然没有,那么看一下整个binlog最后部分pos点信息,(报错是说slave IO接受到的pos点信息比主库的还大)
那么我们看一下解析出来的binlog最后几行信息

mysqlbinlog  /data/data/mysql-bin.000262 > /tmp/binlog1.txt
[root@crmdbM data]# tail -n 20 /tmp/binlog1.txt
#150717  1:21:06 server id 247  end_log_pos 342403057 CRC32 0xbac8e194     Table_map: `crm`.`sky_consultant_statis` mapped to number 7724
# at 342403057
#150717  1:21:06 server id 247  end_log_pos 342403240 CRC32 0x38327494     Write_rows: table id 7724 flags: STMT_END_F

BINLOG '
AuinVRP3AAAAaQAAAPGnaBQAACweAAAAAAEAA2NybQAVc2t5X2NvbnN1bHRhbnRfc3RhdGlzACII
DwoRAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwP2AwMDBYcAAAsC8P///wOU4ci6
AuinVR73AAAAtwAAAKioaBQAACweAAAAAAEAAgAi//////8AAAAM/Eg+AAAAAAAACmNoZW5qdW5o
dWnwvg9Vp9UAAAAAABoAAAAAAAAAoyMAAI8AAAAKAAAABAAAAAYAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIUAAAAEAAAAgQAAAAAAAAAAAAAAgAAAAAAAAAAA
AAAAAAAAAACUdDI4
'/*!*/;
# at 342403240
#150717  1:21:06 server id 247  end_log_pos 342403271 CRC32 0xd0b514e1     Xid = 2027427035
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;


看到没有,主库binlog最大才342403240 ,报错信息是342439502 相差36262,既然从库说已经超过了主库的binlog,SQL线程已经执行到这个位置了
Exec_Master_Log_Pos: 342439502,而且主库断电重启,重新生成了binlog,那么我们重新指定下一个binlog文件开始
[root@crmdbM data]# mysqlbinlog  /data/data/mysql-bin.000263 | head -n 20
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150717  2:29:45 server id 247  end_log_pos 120 CRC32 0xa89426a5     Start: binlog v 4, server v 5.6.14-log created 150717  2:29:45 at startup
ROLLBACK/*!*/;
BINLOG '
GfinVQ/3AAAAdAAAAHgAAAAAAAQANS42LjE0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAZ+KdVEzgNAAgAEgAEBAQEEgAAXAAEGggAAAAICAgCAAAACgoKGRkAAaUm
lKg=
'/*!*/;
# at 120
#150717  2:40:45 server id 247  end_log_pos 199 CRC32 0xe654b6a6     Query    thread_id=35    exec_time=0    error_code=0
SET TIMESTAMP=1437072045/*!*/;
SET @@session.pseudo_thread_id=35/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1075838976/*!*/;
SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
[root@crmdbM data]# 
binlog第一个pos点是4,就可以指定pos点了

(crmDB)root@localhost [(none)]>  change master to master_host='172.16.117.247',master_user='slave',master_password='slave',master_log_file='mysql-bin.000263',master_log_pos=4;
Query OK, 0 rows affected, 2 warnings (0.05 sec)

(crmDB)root@localhost [(none)]> start slave;
Query OK, 0 rows affected (0.00 sec)

(crmDB)root@localhost [(none)]> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.16.117.247
                  Master_User: slave
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000265
          Read_Master_Log_Pos: 60234822
               Relay_Log_File: relay-bin.000005
                Relay_Log_Pos: 18238150
        Relay_Master_Log_File: mysql-bin.000265
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 18237987
              Relay_Log_Space: 60235199
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 2584
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
  Replicate_Ignore_Server_Ids: 
             Master_Server_Id: 247
                  Master_UUID: db5c1497-9cf4-11e3-8259-77a6dd604747
             Master_Info_File: /data/3306/data/master.info
                    SQL_Delay: 0
          SQL_Remaining_Delay: NULL
      Slave_SQL_Running_State: System lock
           Master_Retry_Count: 86400
                  Master_Bind: 
      Last_IO_Error_Timestamp: 
     Last_SQL_Error_Timestamp: 
               Master_SSL_Crl: 
           Master_SSL_Crlpath: 
           Retrieved_Gtid_Set: 
            Executed_Gtid_Set: 
                Auto_Position: 0
1 row in set (0.00 sec)

(crmDB)root@localhost [(none)]> 
没有主键冲突,

然后写个脚本,在凌晨,用pt-table-checksum检测一下库是否有主从差异
########

最终解决方法,主库这两个参数
(crmDB247)root@localhost [(none)]> show global variables like '%sync_binlog%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| sync_binlog   | 0     |
+---------------+-------+
1 row in set (0.00 sec)

(crmDB247)root@localhost [(none)]> 

mysql> show global variables like '%at_trx%';
+--------------------------------+-------+
| Variable_name                  | Value |
+--------------------------------+-------+
| innodb_flush_log_at_trx_commit | 2     |
+--------------------------------+-------+
1 row in set (0.00 sec)

mysql> 
主从复制必须为双1

sync_binlog = 1

innodb_flush_log_at_trx_commit = 1
从库必须配置为###
relay_log = /data/3306/logs/relay-bin
log_slave_updates = 1
skip_slave_start = 1
relay_log_recovery =1
####及时同步sync_master_info sync_relay_log_info 
master_info_repository = TABLE
relay-log-info-repository = TABLE

 

posted @ 2015-07-17 13:43  文采飞扬  阅读(2247)  评论(0编辑  收藏  举报