Kubernetes Node Controller怎么启动

217次阅读

共计 8076 个字符，预计需要花费 21 分钟才能阅读完成。

本篇内容介绍了“Kubernetes Node Controller 怎么启动”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让丸趣 TV 小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

Node Controller 的启动

if ctx.IsControllerEnabled(nodeControllerName) {
 //  解析得到 Cluster CIDR， # clusterCIDR is CIDR Range for Pods in cluster.
 _, clusterCIDR, err := net.ParseCIDR(s.ClusterCIDR)
 //  解析得到 Service CIDR，# serviceCIDR is CIDR Range for Services in cluster.
 _, serviceCIDR, err := net.ParseCIDR(s.ServiceCIDR)
 //  创建 NodeController 实例
 nodeController, err := nodecontroller.NewNodeController(sharedInformers.Core().V1().Pods(),
 sharedInformers.Core().V1().Nodes(),
 sharedInformers.Extensions().V1beta1().DaemonSets(),
 cloud,
 clientBuilder.ClientOrDie(node-controller),
 s.PodEvictionTimeout.Duration,
 s.NodeEvictionRate,
 s.SecondaryNodeEvictionRate,
 s.LargeClusterSizeThreshold,
 s.UnhealthyZoneThreshold,
 s.NodeMonitorGracePeriod.Duration,
 s.NodeStartupGracePeriod.Duration,
 s.NodeMonitorPeriod.Duration,
 clusterCIDR,
 serviceCIDR,
 int(s.NodeCIDRMaskSize),
 s.AllocateNodeCIDRs,
 s.EnableTaintManager,
 utilfeature.DefaultFeatureGate.Enabled(features.TaintBasedEvictions),
 //  执行 Run 方法启动该 Controller
 nodeController.Run()
 // sleep 一个随机时间，该时间大小为  “ControllerStartInterval + rand.Float64()*1.0*float64(ControllerStartInterval))”，其中 ControllerStartInterval 可以通过配置 kube-controller-manager 的 --controller-start-interval”参数指定。time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter))
}

因此，很清晰地，关键就在以下两步：

nodeController, err := nodecontroller.NewNodeController 创建 NodeController 实例。

nodeController.Run() 执行 Run 方法启动该 Controller。

NodeController 的定义

在分析 NodeController 的原理之前，我们有必要先看看 NodeController 是如何定义的，其完整的定义如下：

type NodeController struct {
 allocateNodeCIDRs bool
 cloud cloudprovider.Interface
 clusterCIDR *net.IPNet
 serviceCIDR *net.IPNet
 knownNodeSet map[string]*v1.Node
 kubeClient clientset.Interface
 // Method for easy mocking in unittest.
 lookupIP func(host string) ([]net.IP, error)
 // Value used if sync_nodes_status=False. NodeController will not proactively
 // sync node status in this case, but will monitor node status updated from kubelet. If
 // it doesn t receive update for this amount of time, it will start posting  NodeReady==
 // ConditionUnknown . The amount of time before which NodeController start evicting pods
 // is controlled via flag  pod-eviction-timeout .
 // Note: be cautious when changing the constant, it must work with nodeStatusUpdateFrequency
 // in kubelet. There are several constraints:
 // 1. nodeMonitorGracePeriod must be N times more than nodeStatusUpdateFrequency, where
 // N means number of retries allowed for kubelet to post node status. It is pointless
 // to make nodeMonitorGracePeriod be less than nodeStatusUpdateFrequency, since there
 // will only be fresh values from Kubelet at an interval of nodeStatusUpdateFrequency.
 // The constant must be less than podEvictionTimeout.
 // 2. nodeMonitorGracePeriod can t be too large for user experience - larger value takes
 // longer for user to see up-to-date node status.
 nodeMonitorGracePeriod time.Duration
 // Value controlling NodeController monitoring period, i.e. how often does NodeController
 // check node status posted from kubelet. This value should be lower than nodeMonitorGracePeriod.
 // TODO: Change node status monitor to watch based.
 nodeMonitorPeriod time.Duration
 // Value used if sync_nodes_status=False, only for node startup. When node
 // is just created, e.g. cluster bootstrap or node creation, we give a longer grace period.
 nodeStartupGracePeriod time.Duration
 // per Node map storing last observed Status together with a local time when it was observed.
 // This timestamp is to be used instead of LastProbeTime stored in Condition. We do this
 // to aviod the problem with time skew across the cluster.
 nodeStatusMap map[string]nodeStatusData
 now func() metav1.Time
 // Lock to access evictor workers
 evictorLock sync.Mutex
 // workers that evicts pods from unresponsive nodes.
 zonePodEvictor map[string]*RateLimitedTimedQueue
 // workers that are responsible for tainting nodes.
 zoneNotReadyOrUnreachableTainer map[string]*RateLimitedTimedQueue
 podEvictionTimeout time.Duration
 // The maximum duration before a pod evicted from a node can be forcefully terminated.
 maximumGracePeriod time.Duration
 recorder record.EventRecorder
 nodeLister corelisters.NodeLister
 nodeInformerSynced cache.InformerSynced
 daemonSetStore extensionslisters.DaemonSetLister
 daemonSetInformerSynced cache.InformerSynced
 podInformerSynced cache.InformerSynced
 // allocate/recycle CIDRs for node if allocateNodeCIDRs == true
 cidrAllocator CIDRAllocator
 // manages taints
 taintManager *NoExecuteTaintManager
 forcefullyDeletePod func(*v1.Pod) error
 nodeExistsInCloudProvider func(types.NodeName) (bool, error)
 computeZoneStateFunc func(nodeConditions []*v1.NodeCondition) (int, zoneState)
 enterPartialDisruptionFunc func(nodeNum int) float32
 enterFullDisruptionFunc func(nodeNum int) float32
 zoneStates map[string]zoneState
 evictionLimiterQPS float32
 secondaryEvictionLimiterQPS float32
 largeClusterThreshold int32
 unhealthyZoneThreshold float32
 // if set to true NodeController will start TaintManager that will evict Pods from
 // tainted nodes, if they re not tolerated.
 runTaintManager bool
 // if set to true NodeController will taint Nodes with  TaintNodeNotReady  and  TaintNodeUnreachable 
 // taints instead of evicting Pods itself.
 useTaintBasedEvictions bool
}

NodeController 的行为配置

整个 NodeController 结构体非常复杂，包含 30+ 项，我们将重点关注：

clusterCIDR – 通过 –cluster-cidr 来设置，表示 CIDR Range for Pods in cluster。

serivceCIDR – 通过 –service-cluster-ip-range 来设置，表示 CIDR Range for Services in cluster。

knownNodeSet – 用来记录 NodeController observed 节点的集合。

nodeMonitorGracePeriod – 通过 –node-monitor-grace-period 来设置，默认为 40s，表示在标记某个 Node 为 unhealthy 前，允许 40s 内该 Node unresponsive。

nodeMonitorPeriod – 通过 –node-monitor-period 来设置，默认为 5s，表示在 NodeController 中同步 NodeStatus 的周期。

nodeStatusMap – 用来记录每个 Node 最近一次观察到的 Status。

zonePodEvictor – workers that evicts pods from unresponsive nodes.

zoneNotReadyOrUnreachableTainer – workers that are responsible for tainting nodes.

podEvictionTimeout – 通过 –pod-eviction-timeout 设置，默认为 5min，表示在强制删除 Pod 时，允许的最大的 Pod eviction 时间。

maximumGracePeriod – The maximum duration before a pod evicted from a node can be forcefully terminated. 不可配置，代码中写死为 5min。

nodeLister – 用来获取 Node 数据的 Interface。

daemonSetStore – 用来获取 daemonSet 数据的 Interface。在通过 Eviction 方式删除 Pods 时，会跳过该 Node 上所有的 daemonSet 对应的 Pods。

taintManager – 它是一个 NoExecuteTaintManager 对象，当 runTaintManager(默认 true) 为 true 时:

PodInformer 和 NodeInformer 将监听到 PodAdd,PodDelete,PodUpdate 和 NodeAdd,NodeDelete,NodeUpdate 事件后，

触发 TraintManager 执行对应的 NoExecuteTaintManager.PodUpdated 和 NoExecuteTaintManager.NodeUpdated 方法，

将事件加入到对应的 queue(podUpdateQueue and nodeUpdateQueue)，TaintController 会从这些 queue 中消费这些消息，

TaintController 分别调用 handlePodUpdate 和 handleNodeUpdate 处理。

具体的 TaintController 的处理逻辑，后续再单独分析。

forcefullyDeletePod – 该方法用来 NodeController 调用 apiserver 接口强制删除该 Pod。用来删除那些被调度到 kubelet version 小于 v1.1.0 Node 上的 Pod，因为 kubelet v1.1.0 之前的版本不支持 graceful termination。

computeZoneStateFunc – 该方法返回 Zone 中 NotReadyNodes 数量以及该 Zone 的 state。

如果没有一个 Ready Node，则该 node state 为 FullDisruption；

如果 unhealthy Nodes 所占的比例大于等于 unhealthyZoneThreshold, 则该 node state 为 PartialDisruption;

否则该 node state 就是 Narmal。

enterPartialDisruptionFunc – 该方法用当前 node num 对比 largeClusterThreshold：

如果 nodeNum largeClusterThreshold 则返回 secondaryEvictionLimiterQPS（默认为 0.01）；

否则返回 0，表示停止 evict 操作。

enterFullDisruptionFunc – 用来获取 evictionLimiterQPS（默认为 0.1）的方法，关于 evictionLimiterQPS 的理解见下。

zoneStates – 表示各个 zone 的状态，状态值可以为

Initial;

Normal;

FullDisruption;

PartialDisruption;

evictionLimiterQPS – 通过 –node-eviction-rate 设置，默认为 0.1，表示当某个 Zone status 为 healthy 时，每秒应该剔除的 Nodes 数量，即每 10s 剔除 1 个 Node。

secondaryEvictionLimiterQPS – 通过 –secondary-node-eviction-rate 设置，默认为 0.01，表示当某个 Zone status 为 unhealthy 时，每秒应该剔除的 Nodes 数量，即每 100s 剔除 1 个 Node。

largeClusterThreshold – 通过 –large-cluster-size-threshold 设置，默认为 50，表示当健康 nodes 组成的集群规模小于等于 50 时，secondary-node-eviction-rate 将被设置为 0。

unhealthyZoneThreshold – 通过 –unhealthy-zone-threshold 设置，默认为 0.55，表示当某个 Zone 中 unhealthy Nodes（最少为 3）所占的比例达到 0.55 时，就认为该 Zone 的状态为 unhealthy。

runTaintManager – 在 –enable-taint-manager 中指定，默认为 true。如果为 true，则表示 NodeController 将会启动 TaintManager，由 TaintManager 负责将不能容忍该 Taint 的 Nodes 上的 Pods 进行 evict 操作。

useTaintBasedEvictions – 在 –feature-gates 中指定，默认 TaintBasedEvictions=false, 仍属于 Alpha 特性。如果为 true，则表示将通过 Taint Nodes 的方式来 Evict Pods。

“Kubernetes Node Controller 怎么启动”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注丸趣 TV 网站，丸趣 TV 小编将为大家输出更多高质量的实用文章！

正文完