Kubernetes的PDB怎么應用

發布時間：2021-12-20 10:06:40 來源：億速云閱讀：179 作者：iii 欄目：云計算

這篇文章主要介紹“Kubernetes的PDB怎么應用”，在日常操作中，相信很多人在Kubernetes的PDB怎么應用問題上存在疑惑，小編查閱了各式資料，整理出簡單好用的操作方法，希望對大家解答”Kubernetes的PDB怎么應用”的疑惑有所幫助！接下來，請跟著小編一起來學習吧！

PDB的應用場景

大概在Kubernetes 1.4新增了PodDisruptionBudget Object（后面簡稱PDB），在1.5的時候升級到Beta，但是直到1.9 Released還是Beta。不過沒關系，我們拋開這些，先來想想PDB是為了解決什么問題的。PDB Feature已經一年多了，以前沒有研究過它，主要是沒場景。最近在做基于Kubernetes的ElasticSearch as a Service(簡稱ESaaS)項目方案，要盡量保證任何ElasticSearch Cluster中始終至少要有一個健康可用的ES client pod, ES master pod和ES data pod。很多同學都學想到Deployment中可以設置maxUnavailable，那不就行了嗎？再說了，還會有RS Controller在做副本控制呢？

等下！Deployment中的maxUnavailable是什么時候用的？—— 是用來對使用Deployment部署的應用進行滾動更新時保障最少可服務副本數的！RS Controller呢？—— 那只是副本控制器之一，它并不能給你保證集群中始終有幾個副本的，它是負責盡快的讓實際副本數跟你的期望副本數相同的，它才不管中間某些時刻的實際副本數呢。這個時候，你就可以考慮使用Kubernetes PDB了，它是用來保證應用的高可用的，對那些Voluntary（自愿的）Disruption做好Budgets(預算方案)。

前面提到了Voluntary Disruption，我們來捋一下，什么是Voluntary Disruption？什么又是Involuntary Disruption？

Involuntary Disruption及其應對措施

Involuntary Disruption指的是那些不可控的（或者目前來說難于控制的）外界因素導致的Disruption，比如：

服務器的硬件故障或者內核崩潰導致節點Down了。
如果容器部署在VM，VM被誤刪了或者Hyperwisor出問題了。
集群出現了網絡腦裂。（Kubernetes通過NodeController來處理網絡腦裂情況，但是evict pods時仍然沒有考慮到保證應用的高可用）關于NodeController深度解析，請參考我的下面博文：

Kubernetes Node Controller源碼分析之執行篇
Kubernetes Node Controller源碼分析之創建篇
Kubernetes Node Controller源碼分析之配置篇
Kubernetes Node Controller源碼分析之Taint Controller

某個節點因為不合理的超配導致出現計算資源不足時，觸發kubelet eviction時也沒有考慮到保證應用的高可用。關于kubelet eviction深度解析，請參考我的下面博文：

Kubernetes Eviction Manager源碼分析
Kubernetes Eviction Manager工作機制分析

PDB不是解決Involuntary Disruption的，我們如何在使用Kubernetes時盡量減輕或者緩解Involuntary Disruption對應用高可用的影響呢？

一個應用盡量使用Deployment,RS,StatefulSet等副本控制器部署，并且replicas大于1。
設置應用container的request值，使得即使在資源非常緊張的情況下，也能有足夠的資源供它使用。
另外，盡量考慮物理設備上的HA，比如一個應用的不同副本要跨服務器部署，跨機柜跨機架部署，跨交換機部署等。

PDB是為了Voluntary Disruption時保障應用的高可用

Involuntary Disruption對立的場景，自然就是Voluntary Disruption了，指的是用戶或者集群管理員觸發的，Kubernetes可控的Disruption場景，比如：

刪除那些管理Pods的控制器，比如Deployment，RS，RC，StatefulSet。
觸發應用的滾動更新。
直接批量刪除Pods。
kubectl drain一個節點（節點下線、集群縮容）

PDB就是針對Voluntary Disruption場景設計的，屬于Kubernetes可控的范疇之一，而不是為Involuntary Disruption設計的。

Kube-Node項目上線后，可以支持對接Openstack，AWS，GCE等cloud provider實現Node的自動管理，因此可能會經常有HNA(Horizontal Node Autoscaleer)事件,工作流就有類似drain a node的邏輯，因此需要使用PDB來保障應用的HA。

PDB的使用方法及注意事項

使用說明及注意點

部署在Kubernetes的每個App都可以創建一個對應PDB Object，用來限制Voluntary Disruptions時最大可以down的副本數或者最少應該保持Available的副本數，以此來保證應用的高可用。

PDB可以用來保護由Kubernetes內置控制器管理的應用，這種情況下要求DPB selector等同于這些Controller Object的Selector：

Deployment
ReplicationController
ReplicaSet
StatefulSet

也可以用來保護那些僅僅由PDB Selector自己選擇的Pods Set，但是有兩個使用限制：

只能配置.spec.minAvailable,不能使用maxUnavailable;
.spec.minAvailable只能為整型值，不能是百分比。

因此，不管怎么說，PDB影響的Pods Set都是通過自己的Selector來選擇的，使用時要注意同一個namespace下不同的PDB Object不要使用有重疊的Selectors。

在使用PDB時，你需要弄清楚你的應用類型以及你想要的應對措施：

無狀態應用：比如想至少有60%的副本Available。

解決辦法：創建PDB Object，指定minAvailable為60%，或者maxUnavailable為40%。

單實例的有狀態應用：終止這個實例之前必須提前通知客戶并取得同意。

解決辦法：創建PDB Object，并設置maxUnavailable為0，這樣Kubernetes就會阻止這個實例的刪除，然后去通知并征求用戶同意后，再把這個PDB刪除從而解除這個阻止，然后再去recreate。單實例的statefulset的滾動更新一定會有服務停止時間，因此建議生產環境不要創建單實例的StatefulSet。

多實例的有狀態應用：最少可用的實例數不能少于某個數N（比如受限于raft協議類應用的選舉機制）

解決辦法：設置maxUnavailable=1或者minAvailable=N,分別允許每次只刪除一個實例和每次刪除expected_replicas - minAvailable個實例。

批處理Job：Job需要最終有一個Pod成功完成任務。

Job Controller有自己的機制保證這個，不需要創建PDB。
關于Job Controller深入解讀，請參考我的博文：Kubernetes Job Controller源碼分析

定義PDB Object

進行了以上思考后，確定了要創建PDB，接下來就看看PodDisruptionBudget怎么定義的，下面是個Sample：

apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  name: zk-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: zookeeper

PDB的定義，其實就三項關鍵內容：

.spec.selector用來選擇后端Pods Set，最佳實踐是與應用對應的Deployment,StatefulSet的Selector一致；
.spec.minAvailable表示發生voluntary disruptions的過程中，要保證至少可用的Pods數或者比例；
.spec.maxUnavailable表示發生voluntary disruptions的過程中，要保證最大不可用的Pods數或者比例，要求Kubernetes version >= 1.7；這個配置只能用來對應Deployment，RS，RC，StatefulSet的Pods，推薦優先使用.spec.maxUnavailable。

注意:
同一個PDB Object中不能同時定義.spec.minAvailable和.spec.maxUnavailable。
前面提到，應用滾動更新時Pod的delete和unavailable雖然也屬于voluntary disruption，但是實際上滾動更新有自己的策略控制（marSurge和maxUnavailable），因此PDB不會干預這個過程。
PDB只能保證voluntary disruptions時的副本數，比如evict pod過程中剛好滿足.spec.minAvailable或.spec.maxUnavailable，這時某個本來正常的Pod突然因為Node Down(Involuntary Disruption)了掛了，那么這個時候實際Pods數就比PDB中要求的少了，因此PDB不是萬能的！

使用上，如果設置.spec.minAvailable為100%或者.spec.maxUnavailable為0%，意味著會完全阻止evict pods的過程（Deployment和StatefulSet的滾動更新除外）。

創建PDB Object

kubectl apply -f zk-pdb.yaml創建該PDB Object；

$ kubectl get poddisruptionbudgets
NAME      MIN-AVAILABLE   ALLOWED-DISRUPTIONS   AGE
zk-pdb    2               1                     7s

kubect get pdb zk-pdb -o yaml查看：

$ kubectl get poddisruptionbudgets zk-pdb -o yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
  creationTimestamp: 2017-08-28T02:38:26Z
  generation: 1
  name: zk-pdb
...
status:
  currentHealthy: 3
  desiredHealthy: 3
  disruptedPods: null
  disruptionsAllowed: 1
  expectedPods: 3
  observedGeneration: 1

PDB的工作原理及源碼分析

PDB Object定義是遇到voluntary disruption時用戶的期望狀態，真正去維護這個期望狀態的也是一個由kube-controller-manager管理的Controller，那便是Disruption Controller。

Disruption Controller主要watch Pods和PDBs，當監聽到pod/pdb的Add/Del/Update事件后，并會將對應的pdb object放到rate limit queue中等待worker處理，worker的主要邏輯就是計算PodDisruptionBudgetStatus的currentHealthy, desiredHealthy, expectedCount, disruptedPods,然后調用api更新PDB Status。

pkg/controller/disruption/disruption.go:498

func (dc *DisruptionController) trySync(pdb *policy.PodDisruptionBudget) error {
	pods, err := dc.getPodsForPdb(pdb)
	if err != nil {
		dc.recorder.Eventf(pdb, v1.EventTypeWarning, "NoPods", "Failed to get pods: %v", err)
		return err
	}
	if len(pods) == 0 {
		dc.recorder.Eventf(pdb, v1.EventTypeNormal, "NoPods", "No matching pods found")
	}

	expectedCount, desiredHealthy, err := dc.getExpectedPodCount(pdb, pods)
	if err != nil {
		dc.recorder.Eventf(pdb, v1.EventTypeWarning, "CalculateExpectedPodCountFailed", "Failed to calculate the number of expected pods: %v", err)
		return err
	}

	currentTime := time.Now()
	disruptedPods, recheckTime := dc.buildDisruptedPodMap(pods, pdb, currentTime)
	currentHealthy := countHealthyPods(pods, disruptedPods, currentTime)
	err = dc.updatePdbStatus(pdb, currentHealthy, desiredHealthy, expectedCount, disruptedPods)

	if err == nil && recheckTime != nil {
		// There is always at most one PDB waiting with a particular name in the queue,
		// and each PDB in the queue is associated with the lowest timestamp
		// that was supplied when a PDB with that name was added.
		dc.enqueuePdbForRecheck(pdb, recheckTime.Sub(currentTime))
	}
	return err
}

下面是PodDisruptionBudgetStatus的定義：

pkg/apis/policy/types.go:48

type PodDisruptionBudgetStatus struct {
	// Most recent generation observed when updating this PDB status. PodDisruptionsAllowed and other
	// status informatio is valid only if observedGeneration equals to PDB's object generation.
	// +optional
	ObservedGeneration int64 `json:"observedGeneration,omitempty" protobuf:"varint,1,opt,name=observedGeneration"`

	// DisruptedPods contains information about pods whose eviction was
	// processed by the API server eviction subresource handler but has not
	// yet been observed by the PodDisruptionBudget controller.
	// A pod will be in this map from the time when the API server processed the
	// eviction request to the time when the pod is seen by PDB controller
	// as having been marked for deletion (or after a timeout). The key in the map is the name of the pod
	// and the value is the time when the API server processed the eviction request. If
	// the deletion didn't occur and a pod is still there it will be removed from
	// the list automatically by PodDisruptionBudget controller after some time.
	// If everything goes smooth this map should be empty for the most of the time.
	// Large number of entries in the map may indicate problems with pod deletions.
	DisruptedPods map[string]metav1.Time `json:"disruptedPods" protobuf:"bytes,2,rep,name=disruptedPods"`

	// Number of pod disruptions that are currently allowed.
	PodDisruptionsAllowed int32 `json:"disruptionsAllowed" protobuf:"varint,3,opt,name=disruptionsAllowed"`

	// current number of healthy pods
	CurrentHealthy int32 `json:"currentHealthy" protobuf:"varint,4,opt,name=currentHealthy"`

	// minimum desired number of healthy pods
	DesiredHealthy int32 `json:"desiredHealthy" protobuf:"varint,5,opt,name=desiredHealthy"`

	// total number of pods counted by this disruption budget
	ExpectedPods int32 `json:"expectedPods" protobuf:"varint,6,opt,name=expectedPods"`
}

PodDisruptionBudgetStatus最重要的元素就是**DisruptedPods和PodDisruptionsAllowed**：

DisruptedPods：用來保存那些已經通過apiserver pod eviction subresource處理的pods，但是還沒被PDB Controller發現處理的Pods，是Map類型，key為Pod Name，value是apiserver接受eviction subresource請求的時間。加入里面的Pod有2min的超時時間，如果2min后Pod仍然沒有被刪除，則會將該Pod從隊列中剔除。
PodDisruptionsAllowed：表示當前允許Disruption的Pods數。

Disruption Controller的主要邏輯就是更新PDB.Status，那么問題來了，到底是誰去控制voluntary distribution時eviction的maxUnavailable或者minAvailable的呢？

要再次提醒的是，PDB Controller只處理那些通過pod eviction subresource請求對應的pods，因此上面的這個問題就要到對應的Pod的evictionRest中去找了。

pkg/registry/core/pod/storage/eviction.go:81

// Create attempts to create a new eviction.  That is, it tries to evict a pod.
func (r *EvictionREST) Create(ctx genericapirequest.Context, obj runtime.Object, createValidation rest.ValidateObjectFunc, includeUninitialized bool) (runtime.Object, error) {
	eviction := obj.(*policy.Eviction)

	obj, err := r.store.Get(ctx, eviction.Name, &metav1.GetOptions{})
	if err != nil {
		return nil, err
	}
	pod := obj.(*api.Pod)
	var rtStatus *metav1.Status
	var pdbName string
	err = retry.RetryOnConflict(EvictionsRetry, func() error {
		pdbs, err := r.getPodDisruptionBudgets(ctx, pod)
		if err != nil {
			return err
		}

		if len(pdbs) > 1 {
			rtStatus = &metav1.Status{
				Status:  metav1.StatusFailure,
				Message: "This pod has more than one PodDisruptionBudget, which the eviction subresource does not support.",
				Code:    500,
			}
			return nil
		} else if len(pdbs) == 1 {
			pdb := pdbs[0]
			pdbName = pdb.Name
			// Try to verify-and-decrement

			// If it was false already, or if it becomes false during the course of our retries,
			// raise an error marked as a 429.
			if err := r.checkAndDecrement(pod.Namespace, pod.Name, pdb); err != nil {
				return err
			}
		}
		return nil
	})
	if err == wait.ErrWaitTimeout {
		err = errors.NewTimeoutError(fmt.Sprintf("couldn't update PodDisruptionBudget %q due to conflicts", pdbName), 10)
	}
	if err != nil {
		return nil, err
	}

	if rtStatus != nil {
		return rtStatus, nil
	}

	// At this point there was either no PDB or we succeded in decrementing

	// Try the delete
	_, _, err = r.store.Delete(ctx, eviction.Name, eviction.DeleteOptions)
	if err != nil {
		return nil, err
	}

	// Success!
	return &metav1.Status{Status: metav1.StatusSuccess}, nil
}

通過EvictionREST去請求evict pod的時候，會檢查pod只有一個對應的pdb，否則報錯。關于Eviction API的使用，請參考The Eviction API,下面只給出簡單的Sample：

{
  "apiVersion": "policy/v1beta1",
  "kind": "Eviction",
  "metadata": {
    "name": "quux",
    "namespace": "default"
  }
}

$ curl -v -H 'Content-type: application/json' http://127.0.0.1:8080/api/v1/namespaces/default/pods/quux/eviction -d @eviction.json

然后通過checkAndDecrement去檢查是否滿足PDB的manUnavailable或者minAvailable，如果滿足的話對pdb.Status.PodDisruptionsAllowed減1處理。
checkAndDecrement成功的話，就真正去delete對應的Pod。

// checkAndDecrement checks if the provided PodDisruptionBudget allows any disruption.
func (r *EvictionREST) checkAndDecrement(namespace string, podName string, pdb policy.PodDisruptionBudget) error {
	if pdb.Status.ObservedGeneration < pdb.Generation {
		// TODO(mml): Add a Retry-After header.  Once there are time-based
		// budgets, we can sometimes compute a sensible suggested value.  But
		// even without that, we can give a suggestion (10 minutes?) that
		// prevents well-behaved clients from hammering us.
		err := errors.NewTooManyRequests("Cannot evict pod as it would violate the pod's disruption budget.", 0)
		err.ErrStatus.Details.Causes = append(err.ErrStatus.Details.Causes, metav1.StatusCause{Type: "DisruptionBudget", Message: fmt.Sprintf("The disruption budget %s is still being processed by the server.", pdb.Name)})
		return err
	}
	if pdb.Status.PodDisruptionsAllowed < 0 {
		return errors.NewForbidden(policy.Resource("poddisruptionbudget"), pdb.Name, fmt.Errorf("pdb disruptions allowed is negative"))
	}
	if len(pdb.Status.DisruptedPods) > MaxDisruptedPodSize {
		return errors.NewForbidden(policy.Resource("poddisruptionbudget"), pdb.Name, fmt.Errorf("DisruptedPods map too big - too many evictions not confirmed by PDB controller"))
	}
	if pdb.Status.PodDisruptionsAllowed == 0 {
		err := errors.NewTooManyRequests("Cannot evict pod as it would violate the pod's disruption budget.", 0)
		err.ErrStatus.Details.Causes = append(err.ErrStatus.Details.Causes, metav1.StatusCause{Type: "DisruptionBudget", Message: fmt.Sprintf("The disruption budget %s needs %d healthy pods and has %d currently", pdb.Name, pdb.Status.DesiredHealthy, pdb.Status.CurrentHealthy)})
		return err
	}

	pdb.Status.PodDisruptionsAllowed--
	if pdb.Status.DisruptedPods == nil {
		pdb.Status.DisruptedPods = make(map[string]metav1.Time)
	}
	// Eviction handler needs to inform the PDB controller that it is about to delete a pod
	// so it should not consider it as available in calculations when updating PodDisruptions allowed.
	// If the pod is not deleted within a reasonable time limit PDB controller will assume that it won't
	// be deleted at all and remove it from DisruptedPod map.
	pdb.Status.DisruptedPods[podName] = metav1.Time{Time: time.Now()}
	if _, err := r.podDisruptionBudgetClient.PodDisruptionBudgets(namespace).UpdateStatus(&pdb); err != nil {
		return err
	}

	return nil
}

checkAndDecrement主要檢查pdb.Status.PodDisruptionsAllowed是否大于0，并且DisruptedPods包含的Pods數不能超過2000（Disruption Controller性能可能不足以支撐這么多）。
檢查通過，就對pdb.Status.PodDisruptionsAllowed減1，然后將該Pod加到DisruptedPods這個Map中，map的value就是當前時間（apiserver接受該eviction request的時間）。
更新PDB，PDB Controller因為監聽了PDB的Update Event，接著就會觸發PDB Controller的邏輯，再次去維護PDB Status。

Note：PDB在scheduler中也有用到。基于Pod Priority進行搶占式調度時，generic_scheduler進行preempte pod時會對Node上所有Pod進行PDB驗證，統計違背PDB的Pods數量，Select Node時盡量選擇違背PDB Pods數更少的node。

到此，關于“Kubernetes的PDB怎么應用”的學習就結束了，希望能夠解決大家的疑惑。理論與實踐的搭配能更好的幫助大家學習，快去試試吧！若想繼續學習更多相關知識，請繼續關注億速云網站，小編會繼續努力為大家帶來更多實用的文章！

向AI問一下細節

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Kubernetes的PDB怎么應用

PDB的應用場景

Involuntary Disruption及其應對措施

PDB是為了Voluntary Disruption時保障應用的高可用

PDB的使用方法及注意事項

使用說明及注意點

定義PDB Object

創建PDB Object

PDB的工作原理及源碼分析

猜你喜歡

91超碰碰碰碰久久久久久综合_超碰av人澡人澡人澡人澡人掠_国产黄大片在线观看画质优化_txt小说免费全本

Kubernetes的PDB怎么應用

PDB的應用場景

Involuntary Disruption及其應對措施

PDB是為了Voluntary Disruption時保障應用的高可用

PDB的使用方法及注意事項

使用說明及注意點

定義PDB Object

創建PDB Object

PDB的工作原理及源碼分析

猜你喜歡

最新資訊

相關推薦

相關標簽