您好,登錄后才能下訂單哦!
這篇文章主要講解了“akka cluster相關問題怎么解決”,文中的講解內容簡單清晰,易于學習與理解,下面請大家跟著小編的思路慢慢深入,一起來研究和學習“akka cluster相關問題怎么解決”吧!
最近項目中,用akka(2.6.8) cluster在k8s做分布式的部署,,其中遇到unreachable node 如果一直未手動重啟,則會導致其他的node加入不到cluster中來,
具體的操作為其中的一個非seed node節點由于pod 重啟導致,部署到了其他的節點上,而之前的node(ip),cluster則會一直去連接該node(ip),從而導致異常
首先我們先看一下概念Gossip Convergence,如下:
Gossip convergence cannot occur while any nodes are unreachable. The nodes need to become reachable again, or moved to the down and removed states (see the Cluster Membership Lifecycle section). This only blocks the leader from performing its cluster membership management and does not influence the application running on top of the cluster. For example this means that during a network partition it is not possible to add more nodes to the cluster. The nodes can join, but they will not be moved to the up state until the partition has healed or the unreachable nodes have been downed.
翻譯過來就是: 當任何節點都不可達時,Gossip convergence就不達成一致。節點需要再次變得reachable,或轉移到down和removed狀態。這僅阻止領導者執行其集群成員資格管理,并且不會影響在集群頂部運行的應用程序。例如,這意味著在網絡分
區期間不可能將更多節點添加到群集。節點可以加入,但在分區修復或無法訪問的節點已關閉之前,它們將不會移入up狀態。
很明顯,akka就是要保證每個節點是reachable或者down,這樣才能進行一致性協商
membership-lifecycle也有提到:
If a node is unreachable then gossip convergence is not possible and therefore most leader actions are impossible (for instance, allowing a node to become a part of the cluster). To be able to move forward, the node must become reachable again or the node must be explicitly “downed”. This is required because the state of an unreachable node is unknown and the cluster cannot know if the node has crashed or is only temporarily unreachable because of network issues or GC pauses. See the section about User Actions below for ways a node can be downed.
也就是說,如果一個節點是unreachable的,必須保證節點是reachable或者downed狀態,因為unreachable狀態也有可能是網絡抖動,或者GC導致服務器負載過高引起的,這些狀態akka無法分辨,只能無限的進行重連
既然有了問題,問題咱們就得解決,解決方法自然就可以去官網解決,通過把unreachable節點自動的轉化為down狀態
以http請求的形式,主動的進行狀態轉化
引入split-brain-resolver(SBR)
第一種方式自行研究,我們采用第二種方式: 其中SBR分tatic-quorum, keep-majority, keep-oldest, down-all, lease-majority 五種strategies
我們采用keep-majority策略,其中五種策略的優缺點以及使用場景自行通過官網strategies進行分析
我們看一下keep-majority策略下的akka配置
akka.coordinated-shutdown.exit-jvm = on akka.coordinated-shutdown.exit-code = 0 akka.cluster.downing-provider-class = "akka.cluster.sbr.SplitBrainResolverProvider" akka.cluster.split-brain-resolver.down-all-when-unstable = off akka.cluster.split-brain-resolver.stable-after = 20s akka.cluster.split-brain-resolver.active-strategy = keep-majority akka.cluster.split-brain-resolver.keep-majority.role = "admin"
名詞 | 說明 |
---|---|
akka.coordinated-shutdown.exit-jvm | 當節點從cluster中移除時,是否退出jvm,可選為on off |
akka.coordinated-shutdown.exit-code | 退出時的狀態碼 |
akka.cluster.downing-provider-class | 配置為akka.cluster.sbr.SplitBrainResolverProvider,表示啟動SBR |
akka.cluster.split-brain-resolver.down-all-when-unstable | 當cluster處于不穩定狀態多久,會關閉所有節點,可選on off或者持續時間,如15s |
akka.cluster.split-brain-resolver.stable-after | 節點處于unreachable多久,SBR開始進行節點down操作 |
akka.cluster.split-brain-resolver.active-strategy | keep-majority,啟動的策略 |
akka.cluster.split-brain-resolver.keep-majority.role | 設置只有該role才能進行做SBR決定 |
注意:對于akka.cluster.split-brain-resolver.keep-majority.role,如果cluster由于其他原因,導致只存在少數節點(小于集群節點的一半),而該少數節點的role剛好等于該值,則該少數節點不會退出,
如果不配置該項,則少數節點就會全部退出,從而導致整個集群down
感謝各位的閱讀,以上就是“akka cluster相關問題怎么解決”的內容了,經過本文的學習后,相信大家對akka cluster相關問題怎么解決這一問題有了更深刻的體會,具體使用情況還需要大家實踐驗證。這里是億速云,小編將為大家推送更多相關知識點的文章,歡迎關注!
免責聲明:本站發布的內容(圖片、視頻和文字)以原創、轉載和分享為主,文章觀點不代表本網站立場,如果涉及侵權請聯系站長郵箱:is@yisu.com進行舉報,并提供相關證據,一經查實,將立刻刪除涉嫌侵權內容。