您好,登錄后才能下訂單哦!
這篇文章主要為大家展示了“Oracle DataGuard環境中主庫收到ORA-16198報錯怎么辦”,內容簡而易懂,條理清晰,希望能夠幫助大家解決疑惑,下面讓小編帶領大家一起研究并學習一下“Oracle DataGuard環境中主庫收到ORA-16198報錯怎么辦”這篇文章吧。
客戶的一套Oracle Active DataGuard環境中,主庫在每天的最高峰的時間段內都會收到如下的報錯:
Fri Apr 24 17:25:59 2015
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Error 16198 for archive log file 1 to 'afabdg01'
參考如下的MOS文章:
Redo Transport Services fails with ORA-16198 when using SYNC (synchronous) mode (Doc ID 808469.1)
In this Document
Symptoms |
Cause |
Solution |
References |
Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
Information in this document applies to any platform.
***Checked for relevance on 26-Feb-2014***
This will affect LGWR SYNC transport mode in 10.2.0.x databases and SYNC transport mode in 11.2.0.x databases
Redo Transport Services failed with ORA-16198 from primary database
to either the physical standby database or logical standby database
using LGWR SYNC mode.
The primary alert log file showed:
Fri Feb 6 21:22:26 2009
ORA-16198: LGWR received timedout error from KSR
LGWR: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (16198)
LGWR: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Fri Feb 6 21:22:26 2009
Errors in file /u01/app/oracle/admin/crthpd01/bdump/crthpd01_lgwr_2793488.trc:
ORA-16198: Timeout incurred on internal channel during remote archival
LGWR: Network asynch I/O wait error 16198 log 2 service
'(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=tcp)(HOST=abc)(PORT=1521)))(CONNECT_DATA=(SERVICE_NAME=xyz_STANDBY_XPT.world)(INSTANCE_NAME=xyz)(SERVER=dedicated)))'
Fri Feb 6 21:22:26 2009
Destination LOG_ARCHIVE_DEST_2 is UNSYNCHRONIZED
LGWR: Failed to archive log 2 thread 1 sequence 628 (16198)
Fri Feb 6 21:22:27 2009
If you use Data Guard Broker, then the primary drc log showed:
DG 2009-04-12-12:11:08 0 2 678445059 Operation CTL_GET_STATUS cancelled during phase 2, error = ORA-16778
DG 2009-04-12-12:12:08 0 2 0 RSM detected log transport problem: log
transport for database 'xyz_STANDBY' has the following error.
DG 2009-04-12-12:12:08 0 2 0 ORA-16198: Timeout incurred on internal channel during remote archival
DG 2009-04-12-12:12:08 0 2 0 RSM0: HEALTH CHECK ERROR: ORA-16737: the
redo transport service for standby database "xyz_STANDBY" has an error
DG 2009-04-12-12:12:08 0 2 678445062 Operation CTL_GET_STATUS cancelled during phase 2, error = ORA-16778
DG 2009-04-12-12:12:08 0 2 678445062 Operation CTL_GET_STATUS cancelled during phase 2, error = ORA-16778
The NET_TIMEOUT attribute in the LOG_ARCHIVE_DEST_2 on the primary is set too low so that
LNS couldn't finish sending redo block in 10 seconds in this example.
log_archive_dest_2 service="(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PR
OTOCOL=tcp)(HOST=abc)(PORT=1521)))(CONNECT
_DATA=(SERVICE_NAME=xyz_STANDBY_XPT.world)(
INSTANCE_NAME=xyz)(SERVER=dedicated)))",
LGWR SYNC AFFIRM delay=0 OPTIONAL max_failure=0
max_connections=1 reopen=300 db_unique_name="
xyz_STANDBY" register net_timeout=10 valid
_for=(online_logfile,primary_role)
Noticed that you used LGWR SYNC log transport mode and NET_TIMEOUT was set to 10 .
You'll need to increase the NET_TIMEOUT value in the LOG_ARCHIVE_DEST_2 on the primary to at least 15 to 20 seconds depends on your network speed.
If you don't use Data Guard Broker, then you could change LOG_ARCHIVE_DEST_2 from SQL*Plus using ALTER SYSTEM command. For example,
SQL>ALTER SYSTEM SET LOG_ARCHIVE_DEST_2 SERVICE=xyz_STANDBY
LGWR SYNC DB_UNIQUE_NAME=xyz_STANDBY NET_TIMEOUT=30 VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)
If you use Data Guard Broker, then you will need to modify NetTimeout property from DGMGRL or Grid Control.
For example, connect to the DGMGRL command-line interface from the primary machine,
DGMGRL> connect sys/<sys password>
DGMGRL> EDIT DATABASE '<primary db_unique_name>' SET PROPERTY NetTimeout = 30;
=======================================================================
Note: If NET_TIMEOUT attribute has already been set to 30, and you still get ORA-16198, that means
LNS couldn't finish sending redo block in 30 seconds.
The slowness may caused by:
1. Operating System. Please keep track of OS usage (like iostat).
2. Network. Please keep track network flow (like tcpdump).
Note: Please don't use SYNC log transport mode across a wide area network (WAN) with latencies above 10ms.
The purpose here is to figure out if the slowness is caused by temporary OS glitch or temporary network glitch.
出現這個報錯是由于在默認的NET_TIMEOUT時間(10秒)內主庫LGWR進程沒有將數據完整的發送到備庫,可以將NET_TIMEOUT設置為15或者30秒來增加LGWR發送數據到備庫的時間,減少出現這個問題的幾率。如果NET_TIMEOUT設置為30秒仍然存在此問題,那么就需要考慮是否是主庫到備庫的網絡存在性能問題或存在一定的故障,對于WAN外網的Standby數據庫最好不要使用LGWR SYNC進行實時同步,使用ARC NSYNC同步更合適。
以上是“Oracle DataGuard環境中主庫收到ORA-16198報錯怎么辦”這篇文章的所有內容,感謝各位的閱讀!相信大家都有了一定的了解,希望分享的內容對大家有所幫助,如果還想學習更多知識,歡迎關注億速云行業資訊頻道!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。