您好,登錄后才能下訂單哦!
小編給大家分享一下Pciessd異常Readonly致Mysql反復crash怎么辦,相信大部分人都還不怎么了解,因此分享這篇文章給大家參考一下,希望大家閱讀完這篇文章后大有收獲,下面讓我們一起去了解一下吧!
去年10月份發生一起fio卡變為readonly(和雙十一無關),發生一起fio卡變為readonly,造成mysql crash的故障,整理如下。
【機器配置】
System | Dell Inc.; PowerEdge R710; Processors | physical = 2, cores = 12, virtual = 24, hyperthreading = yes # Memory ##################################################### Total | 94.40G Free | 555.50M Swappiness | vm.swappiness = 0 # Disk ##################################################### 2 ioMemory devices in this system Fusion-io driver version: 3.1.5 build 126 Fusion-io ioDrive 640GB *2 –> mdadm –>/dev/md0 ibdata,ib_logfile,bin_log,relay_log on SAS 600GB raid1
【問題表現】
13:28,監控發出***個db_ping告警
mysql的alert log如下:
/u01/mysql/libexec/mysqld: Can’t create/write to file ‘/u01/mysql/tmp/ibU5kXB4′ ( Errcode: 30) 121104 13:28:10 InnoDB: Error: unable to create temporary file; errno: 30 121104 13:28:10 [ERROR] Plugin ‘InnoDB’ init function returned error. 121104 13:28:10 [ERROR] Plugin ‘InnoDB’ registration as a STORAGE ENGINE failed. 121104 13:28:10 [ERROR] Aborting InnoDB: Error: tried to read 16384 bytes at offset 0 41517056. InnoDB: Was only able to read -1. 121104 13:14:59 InnoDB: Operating system error number 5 in a file operation. InnoDB: Error number 5 means ‘Input/output error’. InnoDB: Some operating system error numbers are described at InnoDB: http://dev.mysql.com/doc/refman/5.1/en/operating-system-error-codes.html InnoDB: File operation call: ‘read’. InnoDB: Cannot continue operation. mysqld: my_new.cc:51: int __cxa_pure_virtual(): Assertion `! “Aborted: pure virtual method called.”‘ failed. 121104 13:14:59 – mysqld got signal 6 ;
由上判斷IO設備有問題,此時touch /u01/mysql/tmp/ibd:
touch: cannot touch `/u01/mysql/tmp/ibd’: Read-only file system |
由于是核心集群,有數據強一致需求,通過DBA手工強制主備切換,故障排除。
【問題原因】
fusionIO卡出現readonly /var/log/message Nov 4 13:14:59 my160130.cm6 kernel: : fioerr Fusion-io ioDrive 640GB 0000:07:00.0: Single Bit Event Upset Error Dete4ted – interrupt: val[0]: 000ff16 fio-status -a fct1 Failed: DEVICE IS OFFLINE. ALL READS AND WRITES WILL FAIL! ioDrive 640GB MLC, Product Number:2TTK9, SN:436946 !! —> There are active errors or warnings on this device! Read below for details. ioDrive 640GB MLC, PN:00214401201 Located in slot 0 Center of Pseudo Low-Profile ioDIMM Adapter SN:436946 WARNING: READ-ONLY MODE. ALL WRITES WILL FAIL! ACTIVE ERRORS: The ioMemory has encountered an internal error and has been temporarily disabled. All reads and writes will fail. The ioMemory is not allowing write operations.
【問題分析】
•SEUs are transient soft errors, and are non-destructive. A reset or rewriting of the device results in normal device behavior thereafter
fio的控制模塊是跑在fpga上的,元數據存儲在DRAM和SSD上,斷電可恢復。2.x的驅動發生該錯誤后,會rewriting進行修復。3.x的驅動提高了安全性,發生該錯誤后,會直接reset,卡read_only等待power recycle
•SEU class errors are caused by cosmic ray particles making it’s way into the NAND controller or by a failing NAND controller
FPGA本身的介質損壞或者宇宙射線,都是該錯誤的誘因。五月份message中有類似Write Path報錯,2.x驅動自動rewrite修復了,3.x的驅動安全級別更高,reset后置為readonly
•Write Path Parity Error
這個錯誤是SEU錯誤的前驅,絕大多數可修復。同集群中,有3臺發生過并自動修復。
•FPGA的成本相比開芯片低廉很多,編程迭代迅速,但健壯性不開芯片
【數據丟失】
因undo,redo,binlog都在u02的SAS盤上日志完整,備庫基本沒有延遲,故沒有數據丟失;
但由于SEU可能導致當時寫入的block錯誤,造成data不一致,保險起見還是重做備庫,利用binlog同步所有數據。
•SEU class of error my result in data on the device being corrupted.The database should be verified or restored from backup
【改進措施】
FPGA老化后,有一定幾率發生Single Event Upset錯誤,核心庫要及時替換;
FPGA對宇宙射線敏感,需要控制機房環境,并分散機柜上架;
改進更敏感的message,dmesg告警。
以上是“Pciessd異常Readonly致Mysql反復crash怎么辦”這篇文章的所有內容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內容對大家有所幫助,如果還想學習更多知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。