您好,登錄后才能下訂單哦!
在某次重啟數據庫后,發現實例服務一直無法注冊,而僅有asm實例的服務注冊:
lsnrctl status LSNRCTL for Linux: Version 12.2.0.1.0 - Production on 17-JAN-2020 19:43:44 Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=127.0.0.1)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=xxxx)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_DATA" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_MGMT" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_OCR" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... The command completed successfully
在ORACLE 12C中注冊監聽服務是有lreg進程來決定的,此時我通過strace來追蹤lreg進程是否存在異常,發現在POLL是持續發生timeout:
epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 31995}, ...}) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 32157}, ...}) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 32271}, ...}) = 0 open("/proc/loadavg", O_RDONLY) = 13 fstat(13, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1a71503000 read(13, "0.16 0.20 0.33 4/1395 210929\n", 1024) = 29 close(13) = 0 munmap(0x7f1a71503000, 4096) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) epoll_wait(9, [], 1024, 3000) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout) getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 66310}, ru_stime={0, 32503}, ...}) = 0 poll([{fd=4, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=6, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=12, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}, {fd=7, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 4, 0) = 0 (Timeout)
關于poll的描述如下:
poll的是一種查詢的方式 函數原型:int poll(struct pollfd *fds ,nfds_t nfds ,int timeout); fds為指向待查詢的設備文件數組; nfds描述第一個參數fds中有多少個設備; timeout為查詢不到我們期望的結果進程睡眠的時間; 返回值:查詢到期望狀態的設備文件個數 功能過程描述:應用程序中調用poll查詢文件的狀態,首先將fds里面的每個設備文件fd取出,調用它們驅動程序的poll函數,查詢是否出現我們期望狀態,查詢完fds里面所有的設備文件得到滿足期望狀態的設備文件的數量,如果這個數為0,則poll調用將導致進程就進入睡眠狀態,睡眠時間由poll函數設定,如果程序在睡眠狀態中fds的某個文件出現我們期望狀態,那么poll立即返回,否則一直睡眠到睡眠時間結束為止,返回值為0;如果這個數大于0 ,poll返回滿足條件的設備數量。 poll相當于open("/dev/xxx",O_RDWR)阻塞打開文件,區別在于當設備文件無數據可讀時poll只導致程序休眠固定時間,而open將導致程序一直休眠到有數據為止。
此時我想難道是進程存在異常,于是通過sqlplus 來重啟數據,之后重新追蹤lreg進程,發現不再出現poll函數 timeout:
epoll_wait(9, [], 1024, 3000) = 0 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 11203}, ru_stime={0, 21388}, ...}) = 0 epoll_wait(9, [], 1024, 3000) = 0 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 11234}, ru_stime={0, 21447}, ...}) = 0 epoll_wait(9, [], 1024, 3000) = 0 getrusage(0x1 /* RUSAGE_??? */, {ru_utime={0, 11264}, ru_stime={0, 21505}, ...}) = 0
但是數據庫實例監聽還是無法注冊到監聽.
此時開始懵圈了,監聽能夠注冊asm實例的服務,說明監聽應該沒有問題,數據庫的lreg進程能夠持續進行注冊,說明注冊沒有問題,那應該是這之間存在什么異常。
于是,我用 oradebug Event 10257 進行追蹤lreg進程:
*** 2020-01-17T20:17:21.365862+08:00 (CDB$ROOT(1)) kmlwait: status: succ=0, wait=0, fail=0 kmmlrl: update for process drop delta: 357 357 149 150 5999 kmmlrl: 149 processes kmmlrl: instance load 2 kmmgdnu: O12DB goodness=0, delta=1, pdb=1, flags=0x104:unblocked/not overloaded, update=0x2:G/-/- kmmgdnu: O12DBXDB goodness=0, delta=1, pdb=1, flags=0x105:unblocked/not overloaded, update=0x2:G/-/- kmmlrl_network_hdlr_state: update kmmlrl_network_hdlr_state: update for network '-oracledefault-' kmmlrl_network_hdlr_state: beq handler: load=149, max=5999, flag=0x2002, upd=0x2 ------------------------------ Start Registration Information ------------------------------ Last update: 53704792 (3 seconds ago) Flag: 0x4, 0x0 State: succ=0, wait=0, fail=0 CDB: root pdb 1 last pdb 4098 open max pdb 2 Dispatcher configuration index: cur 1 max 1 Network '-oracledefault-' pdb 1 : Local listeners: Remote listeners: Handlers: Dedicated flg=0x2002, upd=0x2, srvl=1 services=O12DB hdlr load=149, max=5999 nam=DEDICATED adr=(ADDRESS=(PROTOCOL=BEQ)(PROGRAM=/app/oracle/product/12.2.0/dbhome_1/bin/oracle)(ARGV0='oracle./O12DB1')(ARGS='(LOCAL=NO)')) inf=LOCAL SERVER pri=0x7fea7aa8a208 *** 2020-01-17T20:17:21.365862+08:00 (CDB$ROOT(1)) kmlwait: status: succ=0, wait=0, fail=0 kmmlrl: update for process drop delta: 357 357 149 150 5999 kmmlrl: 149 processes kmmlrl: instance load 2 kmmgdnu: O12DB goodness=0, delta=1, pdb=1, flags=0x104:unblocked/not overloaded, update=0x2:G/-/- kmmgdnu: O12DBXDB goodness=0, delta=1, pdb=1, flags=0x105:unblocked/not overloaded, update=0x2:G/-/- kmmlrl_network_hdlr_state: update kmmlrl_network_hdlr_state: update for network '-oracledefault-' kmmlrl_network_hdlr_state: beq handler: load=149, max=5999, flag=0x2002, upd=0x2 ------------------------------ Start Registration Information ------------------------------ Last update: 53704792 (3 seconds ago) Flag: 0x4, 0x0 State: succ=0, wait=0, fail=0
這里發現Local listeners: 和Remote listeners:等的變量都是空的,查看數據庫的local_listener參數發現了異常當前為oraagent-dummy:
SQL> show parameter local NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ local_listener string -oraagent-dummy-
此時再次查看crs資源狀態發現實例1的狀態是offline的,那是因為我是從sqlplus 直接啟動數據庫,并沒有從集群資源來啟動
ora.o12db.db
1 ONLINE OFFLINE STABLE
于是,用srvctl啟動后,集群資源變為正常,數據庫實例監聽也正確注冊到了監聽:
Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_DATA" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_MGMT" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "+ASM_OCR" has 1 instance(s). Instance "+ASM1", status READY, has 1 handler(s) for this service... Service "O12DB" has 1 instance(s). Instance "O12DB1", status READY, has 1 handler(s) for this service... Service "O12DBXDB" has 1 instance(s). Instance "O12DB1", status READY, has 1 handler(s) for this service... The command completed successfully
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。