您好,登錄后才能下訂單哦!
現象:
節點宕掉后,無法重啟動,需撥心跳網卡幾次,方能自啟動,初步判定為由于HAIP莫名故障,導致一個節點無法啟動CRS
1 檢查網絡
[grid@gmdb1 trace]$ oifcfg iflist -p -n
bond0 22.1.32.0 UNKNOWN 255.255.254.0
bond1 1.255.255.0 UNKNOWN 255.255.255.0
bond1 169.254.0.0 UNKNOWN 255.255.0.0
2 檢查CRS
[root@gmdb2 tmp]# crsctl check crs
CRS-4638: Oracle High Availability Services is online
CRS-4535: Cannot communicate with Cluster Ready Services
CRS-4530: Communications failure contacting Cluster Synchronization Services daemon
CRS-4534: Cannot communicate with Event Manager
3 檢查ASM和HAIP無法啟動:
[root@gmdb2 tmp]# crsctl stat res -t -init
NAME TARGET STATE SERVER STATE_DETAILS Cluster Resources
ora.asm 1 ONLINE OFFLINE
ora.cluster_interconnect.haip 1 ONLINE OFFLINE
4 用mcaasttest.pl檢查,并無問題:
[grid@gmdb2 mcasttest]$ perl mcasttest.pl -n gmdb2,gmdb1 -i bond0,bond1
########### Setup for node gmdb2 ##########
Checking node access 'gmdb2'
Checking node login 'gmdb2'
Checking/Creating Directory /tmp/mcasttest for binary on node 'gmdb2'
Distributing mcast2 binary to node 'gmdb2'
########### Setup for node gmdb1 ##########
Checking node access 'gmdb1'
Checking node login 'gmdb1'
Checking/Creating Directory /tmp/mcasttest for binary on node 'gmdb1'
Distributing mcast2 binary to node 'gmdb1'
########### testing Multicast on all nodes ##########
Test for Multicast address 230.0.1.0
11月 28 16:42:02 | Multicast Succeeded for bond0 using address 230.0.1.0:42000
11月 28 16:42:03 | Multicast Succeeded for bond1 using address 230.0.1.0:42001
Test for Multicast address 224.0.0.251
11月 28 16:42:04 | Multicast Succeeded for bond0 using address 224.0.0.251:42002
11月 28 16:42:05 | Multicast Succeeded for bond1 using address 224.0.0.251:42003
5 檢查CSSD.LOG
2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: begin on node(2), waittime 193000
2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: set curtime (1040905644) for my node
2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: scanning 32 nodes
2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: Node gmdb1, number 1, is in an existing cluster with disk state 3
2017-11-28 11:48:02.797: [ CSSD][2139567872]clssnmLocalJoinEvent: takeover aborted due to cluster member node found on disk
2017-11-28 11:48:02.808: [ CSSD][2358462208]clssnmvDHBValidateNcopy: node 1, gmdb1, has a disk HB, but no network HB, DHB has rcfg 405549564, wrtcnt, 39931581, LATS 1040905654, lastSeqNo 39931578, uniqueness 1510056501, timestamp 1511840882/1783220964
2017-11-28 11:48:03.287: [ CSSD][2144298752]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0
2017-11-28 11:48:03.782: [ CSSD][2363209472]clssnmvDHBValidateNcopy: node 1, gmdb1, has a disk HB, but no network HB, DHB has rcfg 405549564, wrtcnt, 39931583, LATS 1040906624,
日志中有大量的無網絡心跳的記錄;
檢查
SQL> select * from v$cluster_interconnects;
NAME IPADDRESS IS SOURCE
eth2:1 169.254.134.65 NO
發現走的HAIP,而本地的HAIP無法啟動,導致CSSD啟動不起來;檢查CSSD的依賴關系:
[root@12crac2 ~]# crsctl stat res ora.cluster_interconnect.haip -init -f
NAME=ora.cluster_interconnect.haip
TYPE=ora.haip.type
STATE=OFFLINE
TARGET=ONLINE
ACL=owner:root:rw-,pgrp:oinstall:rw-,other::r--,user:grid:r-x
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=
ACTIVE_PLACEMENT=0
AGENT_FILENAME=%CRS_HOME%/bin/orarootagent%CRS_EXE_SUFFIX%
AUTO_START=always
CARDINALITY=1
CARDINALITY_ID=0
CHECK_INTERVAL=30
CREATION_SEED=15
DEFAULT_TEMPLATE=
DEGREE=1
DESCRIPTION="Resource type for a Highly Available network IP"
ENABLED=0
FAILOVER_DELAY=0
FAILURE_INTERVAL=0
FAILURE_THRESHOLD=0
HOSTING_MEMBERS=
ID=ora.cluster_interconnect.haip
LOAD=1
LOGGING_LEVEL=1
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
PLACEMENT=balanced
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
SERVER_POOLS=
START_DEPENDENCIES=hard(ora.gpnpd,ora.cssd)pullup(ora.cssd)
臨時解決辦法:
在確定心跳網絡無法的情況下
禁用HAIP:
crsctl modify res ora.cluster_interconnect.haip -attr "ENABLED=0" -init
crsctl modify res ora.asm -attr "START_DEPENDENCIES='hard(ora.cssd,ora.ctssd)pullup(ora.cssd,ora.ctssd)weak(ora.drivers.acfs)', STOP_DEPENDENCIES='hard(intermediate:ora.cssd)' " -init
修改完成后,再次檢查:
相關文章:MOS上
Known Issues: Grid Infrastructure Redundant Interconnect and ora.cluster_interconnect.haip (文檔 ID 1640865.1)
MOS上關于HAIP的BUG
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。