共计 8011 个字符,预计需要花费 21 分钟才能阅读完成。
这篇文章主要介绍“Daemonset Controller 对 Critical Pod 的特殊处理是什么”,在日常操作中,相信很多人在 Daemonset Controller 对 Critical Pod 的特殊处理是什么问题上存在疑惑,丸趣 TV 小编查阅了各式资料,整理出简单好用的操作方法,希望对大家解答”Daemonset Controller 对 Critical Pod 的特殊处理是什么”的疑惑有所帮助!接下来,请跟着丸趣 TV 小编一起来学习吧!
Daemonset Controller 对 CriticalPod 的特殊处理
在 DaemonSetController 判断某个 node 上是否要运行某个 DaemonSet 时,会调用 DaemonSetsController.simulate 来分析 PredicateFailureReason。
pkg/controller/daemon/daemon_controller.go:1206
func (dsc *DaemonSetsController) simulate(newPod *v1.Pod, node *v1.Node, ds *apps.DaemonSet) ([]algorithm.PredicateFailureReason, *schedulercache.NodeInfo, error) {
// DaemonSet pods shouldn t be deleted by NodeController in case of node problems.
// Add infinite toleration for taint notReady:NoExecute here
// to survive taint-based eviction enforced by NodeController
// when node turns not ready.
v1helper.AddOrUpdateTolerationInPod(newPod, v1.Toleration{
Key: algorithm.TaintNodeNotReady,
Operator: v1.TolerationOpExists,
Effect: v1.TaintEffectNoExecute,
// DaemonSet pods shouldn t be deleted by NodeController in case of node problems.
// Add infinite toleration for taint unreachable:NoExecute here
// to survive taint-based eviction enforced by NodeController
// when node turns unreachable.
v1helper.AddOrUpdateTolerationInPod(newPod, v1.Toleration{
Key: algorithm.TaintNodeUnreachable,
Operator: v1.TolerationOpExists,
Effect: v1.TaintEffectNoExecute,
// According to TaintNodesByCondition, all DaemonSet pods should tolerate
// MemoryPressure and DisPressure taints, and the critical pods should tolerate
// OutOfDisk taint additional.
v1helper.AddOrUpdateTolerationInPod(newPod, v1.Toleration{
Key: algorithm.TaintNodeDiskPressure,
Operator: v1.TolerationOpExists,
Effect: v1.TaintEffectNoSchedule,
v1helper.AddOrUpdateTolerationInPod(newPod, v1.Toleration{
Key: algorithm.TaintNodeMemoryPressure,
Operator: v1.TolerationOpExists,
Effect: v1.TaintEffectNoSchedule,
// TODO(#48843) OutOfDisk taints will be removed in 1.10
if utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation)
kubelettypes.IsCriticalPod(newPod) {
v1helper.AddOrUpdateTolerationInPod(newPod, v1.Toleration{
Key: algorithm.TaintNodeOutOfDisk,
Operator: v1.TolerationOpExists,
Effect: v1.TaintEffectNoSchedule,
_, reasons, err := Predicates(newPod, nodeInfo)
return reasons, nodeInfo, err
}
DeamonSetController 会给 Pod 添加以下 Toleratoins,防止 Node 出现以下 Conditions 被 Node Controller Taint-based eviction 杀死。
NotReady:NoExecute
Unreachable:NoExecute
MemoryPressure:NoSchedule
DisPressure:NoSchedule
当 ExperimentalCriticalPodAnnotation Feature Gate Enable,并且该 Pod 是 CriticalPod 时,还会给该 Pod 加上 OutOfDisk:NoSchedule Toleration。
在 simulate 中,还会像类似 scheduler 一样,进行 Predicates 处理。Predicates 过程中也对 CriticalPod 做了区分对待。
pkg/controller/daemon/daemon_controller.go:1413
// Predicates checks if a DaemonSet s pod can be scheduled on a node using GeneralPredicates
// and PodToleratesNodeTaints predicate
func Predicates(pod *v1.Pod, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {var predicateFails []algorithm.PredicateFailureReason
// If ScheduleDaemonSetPods is enabled, only check nodeSelector and nodeAffinity.
if false /*disabled for 1.10*/ utilfeature.DefaultFeatureGate.Enabled(features.ScheduleDaemonSetPods) {fit, reasons, err := nodeSelectionPredicates(pod, nil, nodeInfo)
if err != nil {
return false, predicateFails, err
if !fit {predicateFails = append(predicateFails, reasons...)
return len(predicateFails) == 0, predicateFails, nil
critical := utilfeature.DefaultFeatureGate.Enabled(features.ExperimentalCriticalPodAnnotation)
kubelettypes.IsCriticalPod(pod)
fit, reasons, err := predicates.PodToleratesNodeTaints(pod, nil, nodeInfo)
if err != nil {
return false, predicateFails, err
if !fit {predicateFails = append(predicateFails, reasons...)
if critical {
// If the pod is marked as critical and support for critical pod annotations is enabled,
// check predicates for critical pods only.
fit, reasons, err = predicates.EssentialPredicates(pod, nil, nodeInfo)
} else {fit, reasons, err = predicates.GeneralPredicates(pod, nil, nodeInfo)
if err != nil {
return false, predicateFails, err
if !fit {predicateFails = append(predicateFails, reasons...)
return len(predicateFails) == 0, predicateFails, nil
}
如果是 CriticalPod,调用 predicates.EssentialPredicates,否则调用 predicates.GeneralPredicates。
这里的 GeneralPredicates 与 EssentialPredicates 有何不同呢?其实 GeneralPredicates 就是比 EssentialPredicates 多了 noncriticalPredicates 处理,也就是 Scheduler 的 Predicate 中的 PodFitsResources。
pkg/scheduler/algorithm/predicates/predicates.go:1076
// noncriticalPredicates are the predicates that only non-critical pods need
func noncriticalPredicates(pod *v1.Pod, meta algorithm.PredicateMetadata, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {var predicateFails []algorithm.PredicateFailureReason
fit, reasons, err := PodFitsResources(pod, meta, nodeInfo)
if err != nil {
return false, predicateFails, err
if !fit {predicateFails = append(predicateFails, reasons...)
return len(predicateFails) == 0, predicateFails, nil
}
因此,对于 CriticalPod,DeamonSetController 进行 Predicate 时不会进行 PodFitsResources 检查。
PriorityClass Validate 对 CriticalPod 的特殊处理
在 Kubernetes 1.11 中,很重要的个更新就是,Priority 和 Preemption 从 alpha 升级为 Beta 了,并且是 Enabled by default。
Kubernetes VersionPriority and Preemption StateEnabled by default1.8alphano1.9alphano1.10alphano1.11betayes
PriorityClass 是属于 scheduling.k8s.io/v1alpha1GroupVersion 的,在 client 提交创建 PriorityClass 请求后,写入 etcd 前,会进行合法性检查(Validate),这其中就有对 SystemClusterCritical 和 SystemNodeCritical 两个 PriorityClass 的特殊对待。
pkg/apis/scheduling/validation/validation.go:30
// ValidatePriorityClass tests whether required fields in the PriorityClass are
// set correctly.
func ValidatePriorityClass(pc *scheduling.PriorityClass) field.ErrorList {
// If the priorityClass starts with a system prefix, it must be one of the
// predefined system priority classes.
if strings.HasPrefix(pc.Name, scheduling.SystemPriorityClassPrefix) {if is, err := scheduling.IsKnownSystemPriorityClass(pc); !is {allErrs = append(allErrs, field.Forbidden(field.NewPath( metadata , name), priority class names with +scheduling.SystemPriorityClassPrefix+ prefix are reserved for system use only. error: +err.Error()))
}
return allErrs
// IsKnownSystemPriorityClass checks that pc is equal to one of the system PriorityClasses.
// It ignores description , labels, annotations, etc. of the PriorityClass.
func IsKnownSystemPriorityClass(pc *PriorityClass) (bool, error) {
for _, spc := range systemPriorityClasses {
if spc.Name == pc.Name {
if spc.Value != pc.Value {return false, fmt.Errorf( value of %v PriorityClass must be %v , spc.Name, spc.Value)
if spc.GlobalDefault != pc.GlobalDefault {return false, fmt.Errorf( globalDefault of %v PriorityClass must be %v , spc.Name, spc.GlobalDefault)
return true, nil
return false, fmt.Errorf(%v is not a known system priority class , pc.Name)
}
PriorityClass 的 Validate 时,如果 PriorityClass s Name 是以 **system-** 为前缀的,那么必须是 system-cluster-critical 或者 system-node-critical 之一。否则就会 Validate Error,拒绝提交。
如果提交的 PriorityClass s Name 为 system-cluster-critical 或者 system-node-critical,那么要求 globalDefault 必须为 false,即 system-cluster-critical 或者 system-node-critical 不能是全局默认的 PriorityClass。
另外,在 PriorityClass 进行 Update 时,目前是不允许其 Name 和 Value 的,也就是说只能更新 Description 和 globalDefault。
pkg/apis/scheduling/helpers.go:27
// SystemPriorityClasses define system priority classes that are auto-created at cluster bootstrapping.
// Our API validation logic ensures that any priority class that has a system prefix or its value
// is higher than HighestUserDefinablePriority is equal to one of these SystemPriorityClasses.
var systemPriorityClasses = []*PriorityClass{
ObjectMeta: metav1.ObjectMeta{
Name: SystemNodeCritical,
Value: SystemCriticalPriority + 1000,
Description: Used for system critical pods that must not be moved from their current node. ,
ObjectMeta: metav1.ObjectMeta{
Name: SystemClusterCritical,
Value: SystemCriticalPriority,
Description: Used for system critical pods that must run in the cluster, but can be moved to another node if necessary. ,
}
到此,关于“Daemonset Controller 对 Critical Pod 的特殊处理是什么”的学习就结束了,希望能够解决大家的疑惑。理论与实践的搭配能更好的帮助大家学习,快去试试吧!若想继续学习更多相关知识,请继续关注丸趣 TV 网站,丸趣 TV 小编会继续努力为大家带来更多实用的文章!