您的位置:首页 > 数据库 > MySQL

Mysql 主主复制失败恢复【转】

2017-09-30 15:41 363 查看

Mysql 主主复制失败

Mysql 主主复制失败

故障描述

架构信息

节点信息

故障分析

同步AIPPRD2的从环境

同步AIPPRD1的从环境

故障描述

原因描述因为机柜PDU老化, 导致整个机柜掉电.
故障时间20160923-10:09
发现时间20160929-13:56

架构信息

TomcatMemcacheKeepaliveMysql主主复制

节点信息

序号节点名称IP地址报错信息
1aipprd110.66.1.52Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000084’ at 91941417, the last event read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91941783, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000084’ at 91942912.’
2aipprd210.66.1.51Got fatal error 1236 from master when reading data from binary log: ‘binlog truncated in the middle of event; consider out of disk space on master; the first event ‘mysql-bin.000082’ at 6369026, the last event read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369026, the last byte read from ‘/aip/mysql/data/log/mysql-bin.000082’ at 6369280.’

故障分析

由于Zabbix的Mysql监控脚本的缘故, 没有触发事件, 所以直至20160929检查Zabbix日志的时候, 才发现该故障, 当时
Keepalive
VIP
aipprd1
上, 这个节点上, 数据库是对外服务的;

那么首先以
aipprd1
为主, 先将
aipprd2
的从环境同步起来;

aipprd2
从环境同步完成后, 再将
aipprd1
的从环境同步起来.

同步
AIPPRD2
的从环境

检查
aipprd2
的Slave状态
根据代码区的描述,
Last_Errno
的错误代码为
1062
, 需要手动修改
Position
.

[root@aipprd2 ~]# mysql -uzabbixmoniter -ppassw0rd -hlocalhost -e "show slave status\G;"
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 10.66.1.52
Master_User: root
Master_Port: 3388
Connect_Retry: 60
Master_Log_File: mysql-bin.000082
Read_Master_Log_Pos: 6369026
Relay_Log_File: mysql-relay-bin.000257
Relay_Log_Pos: 8999
Relay_Master_Log_File: mysql-bin.000082
Slave_IO_Running: No
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1062
Last_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'
Skip_Counter: 0
Exec_Master_Log_Pos: 6367510
Relay_Log_Space: 11099
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000082' at 6369026, the last event read from '/aip/mysql/data/log/mysql-bin.000082' at 6369026, the last byte read from '/aip/mysql/data/log/mysql-bin.000082' at 6369280.'
Last_SQL_Errno: 1062
Last_SQL_Error: Error 'Duplicate entry '93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604' for key 'PRIMARY'' on query. Default database: 'aipprd'. Query: 'INSERT INTO eahttpsession (     sessionid, username, account,      createtime, loginip,userid,explorer,userDomain,computerName,computerUserName)   VALUES ('93FF91EF92866D23E80E4A57D55ED538-n1.tomcat604', '李花', 'XS003_4200',      '2016-09-23 12:10:33', , '10752920','MSIE 7.0','','','')'
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1


aipprd2
上按照
Last_IO_Error
中的
Posfile
Pos
修改.
按照报错给出的提示
Posfile
Pos
修改后, 报错依旧.

mysql> slave stop;
mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root',     master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6369026;
mysql> slave start;


aipprd2
检查
Mysql
的日志
Mysql
的日志中记录了
Crash
开始的时间, 并给出了建议,
Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510


160923 12:35:22 [Note] Starting crash recovery...
160923 12:35:22 [Note] Crash recovery finished.
160923 12:36:27 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log 'mysql-bin.000082' position 6367510




aipprd2
上按照日志中建议的
Posfile
Pos
修改.
按照日志中建议的
Posfile
Pos
修改后, 报错依旧.

mysql> slave stop;
mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000082', master_log_pos=6367510;
mysql> slave start;


aipprd1
上检查
posfile

首先检查
show slave status\G;
中给出的pos, 发现日志中根本不存在.

[root@aipprd1 log]# mysqlbinlog --no-defaults --start-position=6369026 mysql-bin.000083
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;

# at 4

#160923 12:38:59 server id 1 end_log_pos 107 Start: binlog v 4, server v 5.5.24-log created 160923 12:38:59 at startup

ROLLBACK/*!*/;
BINLOG '
47HkVw8BAAAAZwAAAGsAAAAAAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAADjseRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
ERROR: Error in Log_event::read_log_event(): 'read error', data_len: 912208946, event_type: 71
ERROR: Could not read entry at offset 6369026: Error in log format or read error.
DELIMITER

# End of log file

ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;


此后检查
Mysql日志
中建议的pos, 发现
posfile
中是存在此记录的, 而此
posfile
的最后一个
pos
6368660
, 而
show slave status\G;
pos
6369026
, 显然不存在日志文件中.

[root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6367510 mysql-bin.000082
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;

# at 4

#160923 10:39:15 server id 1  end_log_pos 107  Start: binlog v 4, server v 5.5.24-log created 160923 10:39:15 at startup

# Warning: this binlog is either in use or was not closed properly.

ROLLBACK/*!*/;
BINLOG '
05XkVw8BAAAAZwAAAGsAAAABAAQANS41LjI0LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAADTleRXEzgNAAgAEgAEBAQEEgAAVAAEGggAAAAICAgCAA==
'/*!*/;
# at 6367510
# at 6367585
# at 6367963
# at 6367990
# at 6368073
# at 6368163
# at 6368275
# at 6368358
# at 6368558
# at 6368585
# at 6368660
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;


最后在
mysql-bin.000083
中检查
show slave status\G;
提示
pos:6369026
, 也不存在.

[root@aipprd1 log]# mysqlbinlog --no-defaults  --start-position=6369026 mysql-bin.000083


aipprd2
上重新发起修改
posfile
pos
操作
检查
aipprd1
上的日志, 既然
mysql-bin.000082
日志末不存在
pos:6369026
, 且
mysql-bin.000083
为下一个日志, 那么重新发起修改
posfile
pos
的操作.

[root@aipprd1 log]# ll
total 526908
-rw-rw---- 1 mysql mysql 104858346 Sep 19 23:04 mysql-bin.000079
-rw-rw---- 1 mysql mysql 104859112 Sep 22 13:01 mysql-bin.000080
-rw-rw---- 1 mysql mysql  42558345 Sep 23 10:23 mysql-bin.000081
-rw-rw---- 1 mysql mysql   6369280 Sep 23 12:13 mysql-bin.000082
-rw-rw---- 1 mysql mysql 104857953 Sep 26 15:02 mysql-bin.000083
-rw-rw---- 1 mysql mysql  70844542 Sep 27 17:42 mysql-bin.000084
-rw-rw---- 1 mysql mysql 104858026 Sep 29 15:48 mysql-bin.000085
-rw-rw---- 1 mysql mysql    297019 Sep 29 15:54 mysql-bin.000086
-rw-rw---- 1 mysql mysql       296 Sep 29 15:48 mysql-bin.log.index
-rw-rw---- 1 mysql mysql       107 Sep 27 18:05 mysql-relay-bin.000182
-rw-rw---- 1 mysql mysql        43 Sep 27 18:05 mysql-relay-bin.index


重新修改
posfile
mysql-bin.000083
,
pos
0
, 启动
Slave
后, 现在同步正常.

mysql> slave stop;
mysql> CHANGE MASTER TO master_host='10.66.1.52', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000083', master_log_pos=0;
mysql> slave start;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.66.1.52
Master_User: root
Master_Port: 3388
Connect_Retry: 60
Master_Log_File: mysql-bin.000086
Read_Master_Log_Pos: 18585
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 8119286
Relay_Master_Log_File: mysql-bin.000083
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 8119140
Relay_Log_Space: 280580804
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 521370
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 1
1 row in set (0.00 sec)


在同步的过程中, 发现有几个
Last_SQL_Error: Error ‘Duplicate entry 1026
SQL Error
, 这个是因为重复主键导致
Slave
停止工作, 执行以下操作解决(如果有多条重复的主键, 需要执行多次):

mysql> slave stop;
mysql> set GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
mysql> slave start;


还有另一种办法就是修改mysql配置文件
/etc/my.cnf
[mysqld]
下加一行
slave_skip_errors = 1062
,保存后重启
mysql
,
mysql slave
可以正常同步了.

同步
AIPPRD1
的从环境

检查
aipprd1
的Slave状态

mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: 10.66.1.51
Master_User: root
Master_Port: 3388
Connect_Retry: 60
Master_Log_File: mysql-bin.000084
Read_Master_Log_Pos: 91941783
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 296
Relay_Master_Log_File: mysql-bin.000084
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 91941783
Relay_Log_Space: 452
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'binlog truncated in the middle of event; consider out of disk space on master; the first event 'mysql-bin.000084' at 91941417, the last event read from '/aip/mysql/data/log/mysql-bin.000084' at 91941783, the last byte read from '/aip/mysql/data/log/mysql-bin.000084' at 91942912.'
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
1 row in set (0.00 sec)


aipprd2
上检查日志文件
aipprd1
上检查
show slave status\G;
后, 提示需要修改posfile为
mysql-bin.000084
pos91942912
, 因为在
aipprd2
同步完成后, 实际同步的数据是从
aipprd1
过来的, 这些数据在
aipprd1
上本身就存在的.

[root@aipprd2 log]# ll
total 653832
-rw-rw---- 1 mysql mysql 104857774 Sep 21 13:12 mysql-bin.000083
-rw-rw---- 1 mysql mysql  91942912 Sep 23 10:21 mysql-bin.000084
-rw-rw---- 1 mysql mysql   6295552 Sep 23 12:10 mysql-bin.000085
-rw-rw---- 1 mysql mysql      8250 Sep 27 17:39 mysql-bin.000086
-rw-rw---- 1 mysql mysql 104858014 Sep 29 17:20 mysql-bin.000087
-rw-rw---- 1 mysql mysql 104857971 Sep 29 19:18 mysql-bin.000088
-rw-rw---- 1 mysql mysql  55744926 Sep 29 20:23 mysql-bin.000089
-rw-rw---- 1 mysql mysql       259 Sep 29 19:18 mysql-bin.log.index
-rw-rw---- 1 mysql mysql  70844718 Sep 29 15:35 mysql-relay-bin.000005
-rw-rw---- 1 mysql mysql 104857729 Sep 29 15:44 mysql-relay-bin.000006
-rw-rw---- 1 mysql mysql       708 Sep 29 15:44 mysql-relay-bin.000007
-rw-rw---- 1 mysql mysql       199 Sep 29 15:44 mysql-relay-bin.000008
-rw-rw---- 1 mysql mysql  25188420 Sep 30 08:40 mysql-relay-bin.000009
-rw-rw---- 1 mysql mysql       215 Sep 29 18:42 mysql-relay-bin.index


aipprd2
上检查
show master status;
后, 记录
Posfile
mysql-bin.000089
, 既然
aipprd1
的数据为最新的, 且
aipprd2
已经从
aipprd1
后同步完成了(通过检查
show slave status\G;
中的
Seconds_Behind_Master:
, 如果此项值很小, 应该是同步完成了.), 那么两边的数据应该差不多的.

mysql> show master status;
+------------------+-----------+--------------+------------------+
| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+-----------+--------------+------------------+
| mysql-bin.000089 | 104341982 |              |                  |
+------------------+-----------+--------------+------------------+
1 row in set (0.00 sec)


aipprd1
上发起修改
posfile
pos
操作
所以在此用
posfile
mysql-bin.000089
pos
0
来修改, 启动
Slave
后, 开始同步.

mysql> slave stop;
Query OK, 0 rows affected (0.11 sec)
mysql> CHANGE MASTER TO master_host='10.66.1.51', master_port=3388, master_user='root', master_password='passw0rd', master_log_file='mysql-bin.000089', master_log_pos=0;
Query OK, 0 rows affected (0.06 sec)
mysql> slave start;
Query OK, 0 rows affected (0.00 sec)
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.66.1.51
Master_User: root
Master_Port: 3388
Connect_Retry: 60
Master_Log_File: mysql-bin.000089
Read_Master_Log_Pos: 55744926
Relay_Log_File: mysql-relay-bin.000002
Relay_Log_Pos: 253
Relay_Master_Log_File: mysql-bin.000089
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 55744926
Relay_Log_Space: 409
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
1 row in set (0.00 sec)


转自

Mysql 主主复制失败 - bluetom520的博客 - CSDN博客 http://blog.csdn.net/bluetom520/article/details/54893183
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: