show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting to reconnect after a failed master event read Master_Host: Master_User: rep_user Master_Port: 3306 Connect_Retry: 60 Master_Log_File: centos-bin.000002 Read_Master_Log_Pos: 107 Relay_Log_File: relay-bin.000001 Relay_Log_Pos: 4 Relay_Master_Log_File: centos-bin.000002 Slave_IO_Running: Connecting Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 107 Relay_Log_Space: 107 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2017140


160408 12:25:40 [Note] Slave I/O thread: connected to master rep_user@,replication started in log centos-bin.000002 at position 107
160408 12:25:40 [ERROR] Error reading packet from server: File /data2/mysql/centos-bin.000002 not found (Errcode: 2) ( server_errno=29)
160408 12:25:40 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log centos-bin.000002 at postion 107
160408 12:25:40 [ERROR] Error reading packet from server: File /data2/mysql/centos-bin.000002 not found (Errcode: 2) ( server_errno=29)
160408 12:26:40 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log centos-bin.000002 at postion 107
160408 12:26:40 [ERROR] Error reading packet from server: File /data2/mysql/centos-bin.000002 not found (Errcode: 2) ( server_errno=29)

到S1上去检查,show master status;show master logs可以看到业务数据在写入,POS位置也一直在改变,这里奇怪的是00001文件的大小是0

mysql> show master logs;
| Log_name          | File_size |
| centos-bin.000001 |         0 |
| centos-bin.000002 | 568661746 |
2 rows in set (0.00 sec)

mysql> show master logs;
| Log_name          | File_size |
| centos-bin.000001 |         0 |
| centos-bin.000002 | 568941034 |
2 rows in set (0.00 sec)

mysql> show master logs;
| Log_name          | File_size |
| centos-bin.000001 |         0 |
| centos-bin.000002 | 569017617 |
2 rows in set (0.00 sec)


到这里奇特的现象是:业务正常写数据库,show master status也可以看到有pos位置变化,但是磁盘上没有文件,复制无法建立

[root@GZ_NS_M5_SYNC_mysql_sync1-standby_171.40 ~]# find / -name centos-bin.000002
[root@GZ_NS_M5_SYNC_mysql_sync1-standby_171.40 ~]# 




2)rm 把主库的binlog.index.binlog.0000X删除





回想起来这个故障,应该和故障重现的过程是一样的,这套集群3,4个月前搭起来的,在复制正常后,standby的binlog相关文件被删除了,其实删除的整个目录,这个目录专门用来存binlog,relaylog的。删除后搭建复制的时候做change master to,把relay log重建了,但是binlog没有。今天发生了MHA切换,standby变成了master,接受数据写入。MHA里面的filename,pos是连到standby做show master status得到的,但是这些文件已经被删除。所以复制出错。










测试下来,如果是statement的,可以通过show master events in xxxx,得到binlog的命令。如果是row格式的,拿不到具体的SQL命令。


