您好,登錄后才能下訂單哦!
這篇文章主要介紹了iOS App使用GCD導致的卡頓現象怎么辦,具有一定借鑒價值,感興趣的朋友可以參考下,希望大家閱讀完這篇文章之后大有收獲,下面讓小編帶著大家一起了解一下。
最近在調研 iOS app 中存在的各種卡頓現象以及解決方法。
iOS App 出現卡頓(stall)的概率可能超出大部分人的想象,尤其是對于大公司旗艦型 App。一方面是由于業務功能不停累積,各個產品團隊之間缺乏協調,大家都忙著增加功能,系統資源出現瓶頸。另一方面的原因是老設備更新換代太慢,iOS 設備的耐用度極好,現在還有不少 iPhone 4S 在服役,iPhone 6 作為問題設備持有量很高,據估計,現在 iPhone 6s 以前的設備占有比高達 40%。
所以,如果嘗試在線上 App 加入卡頓檢測的工具,你會發現卡頓出現的概率高的驚人。但卡頓的檢測就修復并不簡單,主要是因為難以在開發設備上復現。
之前寫過一篇介紹主線程卡頓監控的文章,好像現在主流的做法都是通過監控 Runloop 事件回調,檢查進入回調的時間間隔是否超過 Threshold,超過則記錄當前 App 所有線程的 call stack。
我前段時間從后臺上報的卡頓日志里看到這樣一個 call stack:
> 0 libsystem_kernel.dylib __workq_kernreturn
> 1 libsystem_pthread.dylib _pthread_workqueue_addthreads
> 2 libdispatch.dylib _dispatch_queue_wakeup_global_slow
> 3 libdispatch.dylib _dispatch_queue_wakeup_with_qos_slow
> 4 libdispatch.dylib dispatch_async
也就是說卡頓出現在 dispatch_async,以我現有對于 GCD 的認知,dispatch_async 是絕無可能出現卡頓的。dispatch_async 的主要任務是從系統線程池里取出一個工作線程,并將 block 放到該線程里去執行。
上述 call stack 確確實實的出現了,而且樣本數量還不少,最后一個函數明顯是一個內核調用。從函數名字猜測,可能是 GCD 嘗試從線程池里獲取線程,但已有線程都在執行狀態,所以向系統內核申請創建新的線程。但創建線程的內核調用會很慢嗎?會慢到讓主線程出現卡頓的程度?帶著疑問我搜索了大量相關資料,最后比較相關的有這樣一篇文章:http://newosxbook.com/articles/GCD.html
其中有這樣一段話:
This isn't due to 10.9's GCD being different - rather, it demonstrates the true asynchronous nature of GCD: The main thread has yet to return from requesting the worker (which it does by pthread_workqueue_addthreads_np, as I'll describe later), and already the worker thread has spawned and is mid execution, possibly on another CPU core. The exact state of the main thread with respect to the worker is largely unpredictable.
作者認為,GCD 申請到的線程有可能是一個正在處理其他任務的 thread,main thread 需要等待這個忙碌的線程返回才能繼續執行,我對這種說法存疑。
最后求助無門的狀況下,我決定使用一次寶貴的 TSL 機會,直接向 Apple 的工程師求教。這里不得不提下,向 Apple 尋求 technical support 是非常寶貴而且可行的方案,每個開發者賬號每年都有 2 次機會,不用非常可惜。
我把問題拋過去后,得到一位 Apple 內核團隊工程師的回復,我將精簡過的回復以問答的形式展示和大家分享:
Q: looks like even if it's async dispatching, the main thread still has to wait for the other thread to return, during which time, the other thread happen to be in mid execution of sth. this confuses me, what exactly is the main thread waiting for?
為什么主線程需要等待 dispatch_async 返回,主線程到底在等待什么?
A: It's hard to say with just a user space backtrace. Frame 0 has clearly sent the current thread into the kernel, and this specific kernel call is /way/ too complex to analyse from outside [1].
從用戶態調用棧無法得出答案,內核可能的狀態過于復雜。
Q: I know it's suggested that we create limited amount of serial queue,and use target queue probably. but what could happen if we don't follow that rule?
Apple 一直推薦自己創建 serial GCD queue 的時候,一定要控制數量,而且最好設置 target queue,否則會出現問題,但會出現什么問題我一直很好奇,這次借著機會一起問了。
A:
* On macOS, where the system is happier to over commit, you end up with a thread explosion. That in turn can lead to problems running out of memory, running out of Mach ports, and so on. * On iOS, which is not happy about over committing, you find that the latency between a block being queued and it running can skyrocket. This can, in turn, have knock-on effects. For example, the last time I looked at a problem like this I found that `NSOperationQueue` was dispatching blocks to the global queue for internal maintenance tasks, so when one subsystem within the app consumed all the dispatch worker threads other subsystems would just stall horribly. Note: In the context of dispatch, an “over commit” is where the system had to allocate more threads to a queue then there are CPU cores. In theory this should never be necessary because work you dispatch to a queue should never block waiting for resources. In practice it's unavoidable because, at a minimum, the work you queue can end up blocking on the VM subsystem. Despite this, it's still best to structure your code to avoid the need for over committing, especially when the over commit doesn't buy you anything. For example, code like this: group = dispatch_group_create(); for (url in urlsToFetch) { dispatch_group_enter(group); dispatch_async(dispatch_get_global_queue(…), ^{ … fetch `url` synchronously … dispatch_group_leave(group); }); } dispatch_group_wait(group, …); is horrible because it ties up 10 dispatch worker threads for a very long time without any benefit. And while this is an extreme example — from dispatch's perspective, networking is /really/ slow — there are less extreme examples that are similarly problematic. From dispatch's perspective, even the disk drive is slow (-:
這段回復很有意思。閱讀過 GCD 源碼的同學會知道,所有默認創建的 GCD queue 都有一個優先級,但其實每個優先級對應兩個 queue,比如一個是 default-priority, 那么另一個就是 default-priority-overcommit。dispatch_async 的時候,會首先將任務丟進 default-priority 隊列,如果隊列滿了,就轉而丟進 default-priority-overcommit。
在 Mac 系統里,GCD 允許 overcommit,意味著每次 dispatch_async 都會創建一個新線程,即使 over commit 了,這些過量的線程會根據優先級來競爭 CPU 資源。
而在 iOS 系統里,GCD 會控制 overcommit,如果某個優先級隊列 over commit 里,那么排在后面的任務就會處于等待狀態。移動設備 CPU 資源比較緊張,這種設計合乎常理。
所以如果在 iOS 里創建過多的 serial queue,那么后面提交的任務可能就會一直處于等待狀態。這也是為什么我們需要嚴格控制 queue 的數量和層級關系,最好是 App 當中每個子系統只能分配固定數量和優先級的 queue,從而避免 thread explosion 導致的代碼無法及時執行問題。
Q:I know the system watchdog can kill an app if the main thread is taking too long to respond. I also heard rumors that there are two other cases that may gets your app killed by watchdog. the first is too many new threads are being created like by random usage of dispatching work to global concurrent queue? the second case is if CPU has been kept too busy like 100% for too long, watchdog kills app too?
我借機問了下系統 watchdong 強殺 App 的原因,因為坊間一直有傳聞是除了主線程長時間沒反應之外,創建過多的線程和 CPU 長時間超負荷運轉也會導致被強殺。
A:I'm not aware of any specific watchdog check along those lines, but it's not hard to imagine that the above-mentioned knock-on effects might jam up your app sufficiently for the watchdog to kill it for other reasons. Running the CPU for too long generates a crash report but it doesn't actually kill the app. It's essentially a ‘warning' crash report about the problem.
創建過多線程不會直接導致 watchdog 強殺,但過多線程有可能導致主線程得不到及時處理,而因為其他原因被 kill。而 CPU 長時間過載并不會導致強殺,但系統會生成一個 report 來警告開發者。我確實看到過不少這類 ‘this is not a crash' 的 crash 日志。
另外還有一些問答,和我當前疑問并不直接相關所以略去。最后再貼一段比較有意思的回復,在閱讀之前大家可以自己先思考下:
dispatch_async(myQueue, ^{ // line A }); // line B
line A 和 line B 誰先執行?
Consider a snippet like this: dispatch_async(myQueue, ^{ // line A }); // line B there's clearly a race condition between lines A and B, that is, between the `dispatch_async` returning and the block running on the queue. This can pan out in multiple ways, including: * If `myQueue` (which we're assuming is a serial queue) is busy, A has to wait so B will definitely run before A. * If `myQueue` is empty, there's no idle CPU, and `myQueue` has a higher priority then the thread that called `dispatch_async`, you could imagine the kernel switching the CPU to `myQueue` so that it can run A. * The thread that called `dispatch_async` could run out of its time quantum after scheduling B on `myQueue` but before returning from `dispatch_async`, which again results in A running before B. * If `myQueue` is empty and there's an idle CPU, A and B could end up running simultaneously.
答案
其實最后我也沒有得到我想要的準確的答案,可能正如回復里所說,情況有很多而且過于復雜,沒法通過一個用戶態的 call stack 簡單推知內核的狀態,但有些有價值的信息還是得以大致理清:
信息一
iOS 系統本身是一個資源調度和分配系統,CPU,disk IO,VM 等都是稀缺資源,各個資源之間會互相影響,主線程的卡頓看似 CPU 資源出現瓶頸,但也有可能內核忙于調度其他資源,比如當前正在發生大量的磁盤讀寫,或者大量的內存申請和清理,都會導致下面這個簡單的創建線程的內核調用出現卡頓:
libsystem_kernel.dylib __workq_kernreturn
所以解決辦法只能是自己分析各 thread 的 call stack,根據用戶場景分析當前正在消耗的系統資源。后面也確實通過最近提交的代碼分析,發現是由于增加了一些非常耗時的磁盤 io 任務(雖然也是放在在子線程),才出現這個看著不怎么沾邊的 call stack。revert 之后卡頓警報就消失了。
信息二
現有的卡頓檢測工具都只能在超時的情況下 dump call stack,但出現超時有可能是任務 A,B,C 共同作用導致的,A 和 B 可能是真正耗時的任務,C 不耗時但碰巧是最后一個,所以被當成元兇,而 A 和 B 卻沒有出現在上報日志里。我暫時也沒有想到特別好的解決辦法。很明顯,libsystem_kernel.dylib __workq_kernreturn 就是一個不怎么耗時的 C 任務。
信息三
在使用 GCD 創建 queue,或者說一個 App 內部使用 GCD 執行子線程任務時,最好有一套 App 所有團隊都能遵循的隊列使用機制,避免創建過多的 thread,而出現意料之外的線程資源緊缺,代碼無法及時執行的情況。這很難,尤其是在大公司動則上百人的團隊里面。
感謝你能夠認真閱讀完這篇文章,希望小編分享的“iOS App使用GCD導致的卡頓現象怎么辦”這篇文章對大家有幫助,同時也希望大家多多支持億速云,關注億速云行業資訊頻道,更多相關知識等著你來學習!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。