您好,登錄后才能下訂單哦!
環境:雙節點RAC, oracle 11.2.3
客戶電話RAC實例2異常,現場查看日志:
實例2:
Fri Aug 25 09:45:16 2017
Received an instance abort message from instance 1Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.Please check instance 1 alert and LMON trace files for detail.
LMS0 (ospid: 24510820): terminating the instance due to error 481
Fri Aug 25 09:45:16 2017
System state dump requested by (instance=2, osid=24510820 (LMS0)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/11.2.0/diag/rdbms/ins/ins2/trace/ins2_diag_21561818.trc
Instance terminated by LMS0, pid = 24510820
實例1
Fri Aug 25 09:44:25 2017
IPC Send timeout detected. Sender: ospid 35783054 [oracle@db1 (LMS1)]
Receiver: inst 2 binc 2073329022 ospid 24183072
IPC Send timeout to 2.2 inc 28 for msg type 65518 from opid 14
Fri Aug 25 09:44:27 2017
Communications reconfiguration: instance_number 2
Fri Aug 25 09:45:16 2017
Detected an inconsistent instance membership by instance 1
Evicting instance 2 from cluster
Waiting for instances to leave: 2
Fri Aug 25 09:45:16 2017
Dumping diagnostic data in directory=[cdmp_20170825094516], requested by (instance=2, osid=24510820 (LMS0)), summary=[abnormal instance termination].
Reconfiguration started (old inc 28, new inc 32)
List of instances:
1 (myinst: 1)
查看/oracle/11.2.0/diag/rdbms/gjj/ins2/trace/ins2_diag_21561818.trc
*** 2017-08-25 14:24:35.900
I'm the voting node
Group reconfiguration cleanup
confirm->incar_num 22, rcfgctx->prop_incar 0
Send my bitmap to master 0
kjzgmappropose : incar 0, newmap -
3000000000000000000000000000000000000000000000000000000000000000
kjzgmappropose : rc from psnd : 30
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
kjzdattdlm: Can not attach to DLM (LMON up=[TRUE], DB mounted=[FALSE]).
懷疑心跳網絡存在問題(這套RAC之前就出現過幾次實例被驅逐的問題,但實例自動都啟動了,這次實例被驅逐后實例2不能啟動,針對之前實例被驅逐的問題進行過參數修改,通過這次的情況來看,實該不是參數設置的問題)。
測試心跳網絡,連通性和傳輸速率都沒有問題,后續打算通過haip進一步提升心跳網絡可用性,在添加haip過程中發現當服和服務器和交換機新添加網絡后出來數據包丟失的情況,丟包率50%,判斷心跳網絡穩定性存在問題,基于此撤掉新添加的心跳線,更換原來的心跳線,重啟被驅逐的實例2,實例正常。
最后判斷是原心跳線RJ45頭存在某兩芯短路的問題造成此次故障。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。