使用MySQL5.6和Xtrabackup的小心一個bug,http://bugs.mysql.com/bug.php?id=70307,這個bug在5.6.23中已經修復。
Xtrabackup備份的時候執行flushs tables with read lock和show slave status會有可能和SQL Thread形成死鎖,導致SQL Thread一直被卡主,STOP也沒有用,Kill我們測試會丟失數據,只有Restart Server才行。
原因是SQL Thread的DML操作完成之后,持有rli->data_lock鎖,commit的時候等待MDL_COMMIT,而flush tables with read lock之后執行的show slave status會等待rli->data_lock;修復方法是rli->data_lock鎖周期只在DML操作期間持有。
重現步驟:
一、創建表
CREATE TABLE test (
id int(10) NOT NULL AUTO_INCREMENT,
age int(11) DEFAULT '0',
PRIMARY KEY (id),
KEY idx_age (age)
) ENGINE=InnoDB
二、master上執行update test set value=sleep(20)+53 where id=1;(增加sleep(20)是為了模擬方便,所以需要是statement的binlog format,row格式不行)
三、等同步到slave,并且正在執行時;執行flush tables with read lock;show slave status;就會阻塞住。
官方詳細的解釋和說明:
Bug#19843808: DEADLOCK ON FLUSH TABLES WITH READ LOCK + SHOW SLAVE STATUS Problem: If a client thread on an slave does FLUSH TABLES WITH READ LOCK; then master does some updates, SHOW SLAVE STATUS in the same client will be blocked. Analysis: Execute FLUSH TABLES WITH READ LOCK on slave and at the same time execute a DML on the master. Then the DML should be made to stop at a state "Waiting for commit lock". This state means that sql thread is holding rli->data_lock and waiting for MDL_COMMIT lock. Now in the same client session where FLUSH TABLES WITH READ LOCK was executed issue SHOW SLAVE STATUS command. This command will be blocked waiting for rli->data_lock causing a dead lock. Once this happens it will not be possible to release the global read lock as "UNLOCK TABLES" command has to be issued in the same client where global read lock was acquired. This causes the dead lock. Fix: Existing code holds the rli->data_lock for the whole duration of commit operation. Instead of holding the lock for entire commit duration the code has been restructured in such a way that the lock is held only during the period when rli object is being updated.