您好,登錄后才能下訂單哦!
1. 近日處理一個由于standby 磁盤IO性能較差,導致Primary的性能受到影響。
主庫主要是等待"log file switch completion",通過ASH dump分析,最終發現實際等待事件是"LGWR-LNS wait on channel”.這個事件基本上可以將問題歸結到網絡性能和standby的IO性能,而客戶的傳輸模式是“MAXIMUM AVAILABILITY"
最后提出兩個解決方案,
(1). 更換性能更好的standby存儲
(2). 修改傳輸模式為MAXIMUM performance,并使用LGWR ASYNC傳輸模式
這里順帶強調一下standby三種傳輸模式,以及對應的可使用的傳輸方式
比較項 | Maximum protection | Maximum availability | Maximum performance |
Redo寫或傳輸進程 | lgwr | lgwr | lgwr或者arch |
網絡傳輸模式 | sync | sync | sync或者async |
IO寫入成功確認 | affirm | affirm | affirm或者noaffirm |
standby redologs | 需要 | 需要 | lgwr需要,arch不需要 |
問題的根本,就是standby IO性能差,而使用“MAXIMUM AVAILABILITY"方式傳輸,使用sync模式,需要磁盤IO的寫入成功的確認信息,導致拖累的primary的性能。
2. 下面是關于SYNC和ASYNC的介紹
http://docs.oracle.com/cd/B10501_01/server.920/a96653/log_arch_dest_param.htm#77394
SYNC=PARALLEL
SYNC=NOPARALLEL
The SYNC attribute specifies that network I/O is to be performed synchronously for the destination, which means that once the I/O is initiated, the archiving process waits for the I/O to complete before continuing. The SYNC attribute is one requirement for setting up a no-data-loss environment, because it ensures that the redo records were successfully transmitted to the standby site before continuing.
If the log writer process is defined to be the transmitter to multiple standby destinations that use the SYNC attribute, the user has the option of specifying SYNC=PARALLEL or SYNC=NOPARALLEL for each of those destinations.
- If SYNC=NOPARALLEL is used, the log writer process performs the network I/O to each destination in series. In other words, the log writer process initiates an I/O to the first destination and waits until it completes before initiating the I/O to the next destination. Specifying the SYNC=NOPARALLEL attribute is the same as specifying the ASYNC=0 attribute.
- If SYNC=PARALLEL is used, the network I/O is initiated asynchronously, so that I/O to multiple destinations can be initiated in parallel. However, once the I/O is initiated, the log writer process waits for each I/O operation to complete before continuing. This is, in effect, the same as performing multiple, synchronous I/O operations simultaneously. The use of SYNC=PARALLEL is likely to perform better than SYNC=NOPARALLEL.
Because the PARALLEL and NOPARALLEL qualifiers only make a difference if multiple destinations are involved, Oracle Corporation recommends that all destinations use the same value.
ASYNC[=blocks]
The ASYNC attribute specifies that network I/O is to be performed asynchronously for the destination. Once the I/O is initiated, the log writer continues processing the next request without waiting for the I/O to complete and without checking the completion status of the I/O. Use of the ASYNC attribute allows standby environments to be maintained with little or no performance effect on the primary database. The optional block count determines the size of the SGA network buffer to be used. In general, the slower the network connection, the larger the block count should be. Also, specifying the ASYNC=0 attribute is the same as specifying the SYNC=NOPARALLEL attribute.
通過仔細解讀文檔,可以總結下面幾點
sync,在IO傳輸發起之后,只有在standby做IO確認成功信息反饋之后,primary才能繼續進行下一步,這樣,如果standby IO性能較差,就會影響主庫性能。
Async,是不需要對IO進行確認了,在primary發起IO初始化之后,就進行下一步工作了,standby的寫入快慢,不會影響到primary
3. 在充分理解這兩個概念之后,再回頭分析客戶的問題:
客戶一共有三個standby,但是LOG_ARCHIVE_DEST_3對應的standby服務器性能較差, 在系統相對繁忙的時間段, 在oswatcher log中可以發現,standby的IO使用率都是100%。
至此,問題已經確認,就是standby服務器和primary的性能差距比較大,同時由于使用LGWR SYNC傳輸模式,導致standby的IO壓力比較大。
并且primary要在standby確認收到log信息的傳輸完成,才能繼續下一步,導致primary的性能受到很大影響。
4. 總結,建議standby的性能不要與primary有太大差異,至少能達到primary的70~80%的性能,不然在switch和fail over的時候,standby根本無法接管primary的業務。
而且在日常的日志傳輸等,也會影響primary的性能。
也許看完本文之后,你會有個疑問?說好的Maximum availability可以自動切換成Maximum performance呢?怎么就會影響到性能呢?
5. 帶著問題,我們來分析一下,先看概念:
Maximum availability Thisprotection mode provides the highest level of data protection that is possiblewithout compromising the availability of the primary database. Like maximumprotection mode, a transaction will not commit until the redo needed to recoverthat transaction is written to the local online redo log and to the standbyredo log of at least one transactionally consistent standby database. Unlikemaximum protection mode, the primary database does not shut down if a faultprevents it from writing its redo stream to a remote standby redo log. Instead,the primary database operates in maximum performance mode until the fault iscorrected, and all gaps in redo log files are resolved. When all gaps areresolved, the primary database automatically resumes operating in maximumavailability mode.
This mode ensures that no data loss will occur if the primarydatabase fails, but only if a second fault does not prevent a complete set ofredo data from being sent from the primary database to at least one standbydatabase.
最大可用性模式 -- 這種保護模式提供了可能的最高級別的數據保護,而不用與主數據庫的可用性相折衷。與最大保護模式相同,在恢復事務所需的重做寫到本地聯機重做日志和至少一 個事務一致性備數據庫上的備重做日志之前,事務將不會提交。與最大保護模式不同的 是,如果故障導致主數據庫無法寫重做流到異地備重做日志時,主數據庫不會關閉。替代地,主數據庫以最大性能模式運行直到故障消除,并且解決所有重做日志文 件中的中斷。當所有中斷解決之后,主數據庫自動繼續以最大可用性模式運行。
這種模式確保如果主數據庫故障,但是只有當第二次故障沒有阻止完整的重做數據集從主數據庫發送到至少一個備數據庫時,不發生數據丟失。
在Maximum availability模式下,如果和備庫的連接正常,運行方式等同Maximum protection模式,事務也是主備庫同時提交。如果備庫和主庫失去聯系,則主庫自動切換到Maximum performance模式下運行,保證主庫具有最大的可用性。
發現沒?“如果備庫和主庫失去聯系”,“失去聯系”非常重要。本文的情況,恰恰是正常聯系,就是IO性能較差,不是完全不提供服務。
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。