您好,登錄后才能下訂單哦!
寫在前面:
??? 最近一段時間都在做傳統主從復制相關的測試,思考了很多上線主從復制架構后,可能會發生的問題,然后針對性設置了這些故障,再然后思考如何在保證業務可用,或者對業務沖擊盡可能的小的前提下,進行故障的恢復,也算是小有所得,現在呢,筆者就主從復制故障時,需要跳過故障點時所用到的 sql_slave_skip_counte 變量進行記錄描述
sql_slave_skip_counter 介紹:
摘自MySQL官方的解釋(強烈建議閱讀英文原文。中文版,是筆者自己的理解,只能說仁者見仁)
SET GLOBAL sql_slave_skip_counter Syntax:
??????? SET GLOBAL sql_slave_skip_counter = N
This statement skips the next N events from the master. This is useful for recovering from replication stops caused by a statement.
??????? 跳過N個events。注意:以event為單位,而不是以事務為單位,只有在由單條語句組成的事務時,兩者才等價。
??????? 如:一個事務由多個EVENT組成,BEGIN;INSERT;UPDATE;DELETE;COMMOIT; 這種情況下,兩者絕不相等
This statement is valid only when the slave threads are not running. Otherwise, it produces an error.
When using this statement, it is important to understand that the binary log is actually organized as a sequence of groups known as event groups. Each event group consists of a sequence of events.
For transactional tables, an event group corresponds to a transaction.
??? ????對于事務表,一個event group對應一個事務
or nontransactional tables, an event group corresponds to a single SQL statement.
??????? 對于非事務表,一個event group對應一條SQL
When you use SET GLOBAL sql_slave_skip_counter to skip events and the result is in the middle of a group, the slave continues to skip events until it reaches the end of the group. Execution then starts with the next event group
??? ????當你跳過event的時候,如果N的值,處于event group之中,那么slave會繼續跳過event,直至跳過這個event group,從下一個event group開始
對于事務表使用sql_slave_skip_counter的情況:
1、跳過1032復制錯誤(update/delete error)
跳過由單條SQL組成的事務:
在Slave主機上人為的刪除兩條數據:
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '9';
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '11';
而Master在變更上述兩條記錄的時候會報錯,導致復制中斷:
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('孫權', '吳國', '妹妹');
UPDATE `edusoho_e`.`t1` SET xname='游戲' WHERE id=7;
UPDATE `edusoho_e`.`t1` SET age=40 WHERE id=11;????#報錯
DELETE FROM `edusoho_e`.`t1` WHERE age=40;????????????#報錯
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('曹丕', '魏國', '甄姬');
DELETE FROM `edusoho_e`.`t1` WHERE id=1;
UPDATE `edusoho_e`.`t1` SET hobby='Games' WHERE id=3;?
在Slave查看主從復制狀態時,就會發現報錯信息:
mysql> show slave status\G;
*************************** 1. row ***************************
Read_Master_Log_Pos: 2176
Exec_Master_Log_Pos: 874
Last_Errno: 1032
Last_Error: Could not execute Update_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 1127
Slave_IO_Running: Yes
Slave_SQL_Running: No
在Master主機上查看position做了什么操作:
mysql> show binlog events in 'mysql-bin.000002' from 874;
+------------------+------+-------------+-----------+-------------+---------------------------------+
| Log_name???????? | Pos? | Event_type? | Server_id | End_log_pos | Info??????????????????????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+
| mysql-bin.000002 |? 874 | Query?????? |???????? 2 |???????? 956 | BEGIN?????????????????????????? |
| mysql-bin.000002 |? 956 | Table_map?? |???????? 2 |??????? 1017 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1017 | Update_rows |???????? 2 |??????? 1127 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1127 | Xid???????? |???????? 2 |??????? 1158 | COMMIT /* xid=437 */??????????? |
| mysql-bin.000002 | 1158 | Query?????? |???????? 2 |??????? 1240 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1240 | Table_map?? |???????? 2 |??????? 1301 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1301 | Delete_rows |???????? 2 |??????? 1407 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1407 | Xid???????? |???????? 2 |??????? 1438 | COMMIT /* xid=446 */??????????? |
| mysql-bin.000002 | 1438 | Query?????? |???????? 2 |??????? 1520 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1520 | Table_map?? |???????? 2 |??????? 1581 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1581 | Write_rows? |???????? 2 |??????? 1644 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1644 | Xid???????? |???????? 2 |??????? 1675 | COMMIT /* xid=455 */??????????? |
| mysql-bin.000002 | 1675 | Query?????? |???????? 2 |??????? 1757 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1757 | Table_map?? |???????? 2 |??????? 1818 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 1818 | Delete_rows |???????? 2 |??????? 1880 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 1880 | Xid???????? |???????? 2 |??????? 1911 | COMMIT /* xid=464 */??????????? |
| mysql-bin.000002 | 1911 | Query?????? |???????? 2 |??????? 1993 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 1993 | Table_map?? |???????? 2 |??????? 2054 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 2054 | Update_rows |???????? 2 |??????? 2145 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 2145 | Xid???????? |???????? 2 |??????? 2176 | COMMIT /* xid=473 */??????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+
在Slave跳過第一個Update_rows event復制報錯:
mysql> set global sql_slave_skip_counter=1;
mysql> start slave sql_thread;
mysql> show slave status\G;
Slave_IO_Running: Yes
Slave_SQL_Running: No
Exec_Master_Log_Pos: 1158
Last_SQL_Errno: 1032
Last_SQL_Error: Could not execute Delete_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 1407
成功跳過第一個events group
在Slave繼續跳過第二個Delete_rows event復制報錯:
mysql> set global sql_slave_skip_counter=1;
mysql> start slave sql_thread;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_SQL_Errno: 0
Last_SQL_Error:
成功跳過第二個events group
注意:
雖然主從復制出現的故障成功跳過了,但只是暫時恢復了正常的主從復制狀態,需要盡快的對Slave缺失的數據進行補齊,不然Master對Slave不存在的數據做的變更,仍然會重復導致主從復制故障,筆者覺得如果你的數據量差異不是太大的話,可以考慮使用pt-table-checksum和pt-table-sync工具進行恢復,如果你的數據量很大且數據差異很多,還是建議重做Slave較好,因為使用工具會鎖表,會對線上業務造成一定的影響,具體情況,請自行考量。
跳過由多條SQL(event)組成的事務:
在Slave主機上人為的刪除一條數據:
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '7';
在Master主機上產生一個由多條SQL組成的事務:
BEGIN;
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '7';
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('懶死', '不知道', '吃了睡睡了吃');
COMMIT;
因為Slave主機上已經刪除id=7的數據,在Slave查看主從復制狀態時,就會發現報錯信息:
mysql> show slave status\G;
*************************** 1. row ***************************
Read_Master_Log_Pos: 7219
Exec_Master_Log_Pos: 6840
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1032
Last_Error: Could not execute Delete_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000002, end_log_pos 7049
在Master主機上查看position做了什么操作:
mysql> show binlog events in 'mysql-bin.000002' from 6840;
+------------------+------+-------------+-----------+-------------+---------------------------------+
| Log_name???????? | Pos? | Event_type? | Server_id | End_log_pos | Info??????????????????????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+
| mysql-bin.000002 | 6840 | Query?????? |???????? 2 |??????? 6922 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 6922 | Table_map?? |???????? 2 |??????? 6983 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 6983 | Delete_rows |???????? 2 |??????? 7049 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 7049 | Table_map?? |???????? 2 |??????? 7110 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 7110 | Write_rows? |???????? 2 |??????? 7188 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 7188 | Xid???????? |???????? 2 |??????? 7219 | COMMIT /* xid=825 */??????????? |
+------------------+------+-------------+-----------+-------------+---------------------------------+
可以看到,這個事務是由兩個SQL(event)組成的
如果使用 sql_slave_skip_counter=N 跳過由多條SQL組成的事務會怎樣呢?
mysql> set global sql_slave_skip_counter=1;
mysql> start slave sql_thread;
mysql> show slave status\G;
*************************** 1. row ***************************
Read_Master_Log_Pos: 7219
Exec_Master_Log_Pos: 7219
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Last_Errno: 0
Last_Error:
發現問題沒有,在使用sql_slave_skip_counter跳過由多條SQL(event)組成的事務時,從在Master上執行的 show binlog events 可以看到,如果只是跳過出報錯SQL語句,那么 Exec_Master_Log_Pos 值應該為7110,但是現在為7219,說明將整個event group跳過了,但是7110的SQL數據是我們需要的,所以,和單條SQL組成的事務一樣,主從復制狀態雖然恢復,但是數據仍處于不一致狀態,要抓緊時間補齊數據或重做Slave
2、由多條SQL(event)組成的事務時,僅跳過一個event,而不是一個event group:
在Slave主機上人為的刪除一條數據:
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '17';
在Master主機上產生一個由多條SQL組成的事務:
BEGIN;
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '17';
INSERT INTO `edusoho_e`.`t1` (`xname`, `address`, `hobby`) VALUES ('我是誰', '不知道', '吃了睡睡了吃');
COMMIT;
因為Slave主機上已經刪除id=17的數據,在Slave查看主從復制狀態時,就會發現報錯信息:
Exec_Master_Log_Pos: 120
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1032
Last_Error: Could not execute Delete_rows event on table edusoho_e.t1; Can't find record in 't1', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000004, end_log_pos 341
在Master主機上查看position做了什么操作:
mysqlbinlog -v --base64-output=decode --start-position=120 mysql-bin.000004
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 120
#190507 13:52:05 server id 2? end_log_pos 202 CRC32 0x0ca0c280 ?? ?Query?? ?thread_id=3?? ?exec_time=0?? ?error_code=0
SET TIMESTAMP=1557208325/*!*/;
SET @@session.pseudo_thread_id=3/*!*/;
SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/;
SET @@session.sql_mode=1073741824/*!*/;
SET @@session.auto_increment_increment=2, @@session.auto_increment_offset=1/*!*/;
/*!\C utf8 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=33/*!*/;
SET @@session.lc_time_names=0/*!*/;
SET @@session.collation_database=DEFAULT/*!*/;
BEGIN
/*!*/;
# at 202
#190507 13:52:05 server id 2? end_log_pos 263 CRC32 0x20d2e89d ?? ?Table_map: `edusoho_e`.`t1` mapped to number 216
# at 263
#190507 13:52:05 server id 2? end_log_pos 341 CRC32 0xbec6fd45 ?? ?Delete_rows: table id 216 flags: STMT_END_F
### DELETE FROM `edusoho_e`.`t1`
### WHERE
###?? @1=17
###?? @2='懶死'
###?? @3='不知道'
###?? @4=1
###?? @5='吃了睡睡了吃'
###?? @6=18
# at 341
#190507 13:52:05 server id 2? end_log_pos 402 CRC32 0xa37bc5c9 ?? ?Table_map: `edusoho_e`.`t1` mapped to number 216
# at 402
#190507 13:52:05 server id 2? end_log_pos 483 CRC32 0x0d774707 ?? ?Write_rows: table id 216 flags: STMT_END_F
### INSERT INTO `edusoho_e`.`t1`
### SET
###?? @1=21
###?? @2='我是誰'
###?? @3='不知道'
###?? @4=1
###?? @5='吃了睡睡了吃'
###?? @6=18
# at 483
#190507 13:52:05 server id 2? end_log_pos 514 CRC32 0x8c333b30 ?? ?Xid = 411
COMMIT/*!*/;
DELIMITER ;
# End of log file
ROLLBACK /* added by mysqlbinlog */;
/*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/;
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
可以看到,綠色的部分就是我們需要跳過的,而第二個event是需要我們保留的
這個時候,就需要用到slave_exec_mode這個變量了,至于slave_exec_mode詳細介紹,還是請參考MySQL官網資料
mysql> set global slave_exec_mode='IDEMPOTENT';
mysql> start slave sql_thread;
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 514
去Slave上edusoho_e.t1表上查看,數據id=21的數據已經過去了,此時,數據處于一致性狀態
3、跳過主鍵沖突1062錯誤(Duplicate entry):
在Slave主鍵上先插入一條id值:
INSERT INTO `edusoho_e`.`t1` (`id`,`xname`, `address`, `hobby`, `age`) VALUES (19,'小玩子', '明朝', '皇后', '25');
因為Slave已經占用了Master要自動產生的主鍵值id=19,所以Slave主機會報錯:
INSERT INTO `edusoho_e`.`t1` (`id`,`xname`, `address`, `hobby`, `age`) VALUES (19,'朱棣', '明朝', '皇帝', '36');?
查看Slave主從復制狀態發現已經發生了主從復制報錯:
mysql> show slave status\G;
*************************** 1. row ***************************
Slave_IO_Running: Yes
Slave_SQL_Running: No
Last_Errno: 1062
Last_Error: Could not execute Write_rows event on table edusoho_e.t1; Duplicate entry '19' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000002, end_log_pos 7425
Exec_Master_Log_Pos: 7219
查看Master binlog:
mysql> show binlog events in 'mysql-bin.000002' from 7219;
+------------------+------+------------+-----------+-------------+---------------------------------+
| Log_name???????? | Pos? | Event_type | Server_id | End_log_pos | Info??????????????????????????? |
+------------------+------+------------+-----------+-------------+---------------------------------+
| mysql-bin.000002 | 7219 | Query????? |???????? 2 |??????? 7301 | BEGIN?????????????????????????? |
| mysql-bin.000002 | 7301 | Table_map? |???????? 2 |??????? 7362 | table_id: 213 (edusoho_e.t1)??? |
| mysql-bin.000002 | 7362 | Write_rows |???????? 2 |??????? 7425 | table_id: 213 flags: STMT_END_F |
| mysql-bin.000002 | 7425 | Xid??????? |???????? 2 |??????? 7456 | COMMIT /* xid=893 */??????????? |
+------------------+------+------------+-----------+-------------+---------------------------------+
思考:
因為Slave這條數據已經存在,如果在Slave主機上把這條數據刪除了,Slave會不會直接同步過來?(答案是:不會。需要重啟Slave thread):
DELETE FROM `edusoho_e`.`t1` WHERE `id` = '19';
mysql> stop slave;
mysql> start slave user='repliter' password='123456';
驗證的時候,發現數據已經同步過去了
題外:
以上是筆者對于單條SQL組成的事務、多條SQL組成的事務,及在這些單/多條SQL組成的事務下,人為設置的1032和1062復制錯誤和解決方法,還有sql_slave_skip_counter和slave_exec_mode各自的用法和跳過的范圍,當然了,筆者呢,做的只是線上應用前的部署測試,并沒有經過任何的實戰檢測。一方面,僅為廣大同行做個參考;另一方面,記錄筆者自己的心得和針對問題解決的思路做個總結,當問題真正發生的時候,有個方向可以進行參考,而不至于手忙腳亂,不知所措,所以,對其中有誤之處和理解不到位的地方,望請下方留言指正,不勝感激!
還有,筆者做的,只是針對事務表,做的sql_slave_skip_counter和slave_exec_mode測試,對于非事務表,sql_slave_skip_counter和slave_exec_mode用途會稍有不同,請自行百度吧
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。