共计 8076 个字符,预计需要花费 21 分钟才能阅读完成。
本篇内容介绍了“Kubernetes Node Controller 怎么启动”的有关知识,在实际案例的操作过程中,不少人都会遇到这样的困境,接下来就让丸趣 TV 小编带领大家学习一下如何处理这些情况吧!希望大家仔细阅读,能够学有所成!
Node Controller 的启动
if ctx.IsControllerEnabled(nodeControllerName) {
// 解析得到 Cluster CIDR, # clusterCIDR is CIDR Range for Pods in cluster.
_, clusterCIDR, err := net.ParseCIDR(s.ClusterCIDR)
// 解析得到 Service CIDR,# serviceCIDR is CIDR Range for Services in cluster.
_, serviceCIDR, err := net.ParseCIDR(s.ServiceCIDR)
// 创建 NodeController 实例
nodeController, err := nodecontroller.NewNodeController(sharedInformers.Core().V1().Pods(),
sharedInformers.Core().V1().Nodes(),
sharedInformers.Extensions().V1beta1().DaemonSets(),
cloud,
clientBuilder.ClientOrDie(node-controller),
s.PodEvictionTimeout.Duration,
s.NodeEvictionRate,
s.SecondaryNodeEvictionRate,
s.LargeClusterSizeThreshold,
s.UnhealthyZoneThreshold,
s.NodeMonitorGracePeriod.Duration,
s.NodeStartupGracePeriod.Duration,
s.NodeMonitorPeriod.Duration,
clusterCIDR,
serviceCIDR,
int(s.NodeCIDRMaskSize),
s.AllocateNodeCIDRs,
s.EnableTaintManager,
utilfeature.DefaultFeatureGate.Enabled(features.TaintBasedEvictions),
// 执行 Run 方法启动该 Controller
nodeController.Run()
// sleep 一个随机时间,该时间大小为 “ControllerStartInterval + rand.Float64()*1.0*float64(ControllerStartInterval))”,其中 ControllerStartInterval 可以通过配置 kube-controller-manager 的 --controller-start-interval”参数指定。time.Sleep(wait.Jitter(s.ControllerStartInterval.Duration, ControllerStartJitter))
}
因此,很清晰地,关键就在以下两步:
nodeController, err := nodecontroller.NewNodeController 创建 NodeController 实例。
nodeController.Run() 执行 Run 方法启动该 Controller。
NodeController 的定义
在分析 NodeController 的原理之前,我们有必要先看看 NodeController 是如何定义的,其完整的定义如下:
type NodeController struct {
allocateNodeCIDRs bool
cloud cloudprovider.Interface
clusterCIDR *net.IPNet
serviceCIDR *net.IPNet
knownNodeSet map[string]*v1.Node
kubeClient clientset.Interface
// Method for easy mocking in unittest.
lookupIP func(host string) ([]net.IP, error)
// Value used if sync_nodes_status=False. NodeController will not proactively
// sync node status in this case, but will monitor node status updated from kubelet. If
// it doesn t receive update for this amount of time, it will start posting NodeReady==
// ConditionUnknown . The amount of time before which NodeController start evicting pods
// is controlled via flag pod-eviction-timeout .
// Note: be cautious when changing the constant, it must work with nodeStatusUpdateFrequency
// in kubelet. There are several constraints:
// 1. nodeMonitorGracePeriod must be N times more than nodeStatusUpdateFrequency, where
// N means number of retries allowed for kubelet to post node status. It is pointless
// to make nodeMonitorGracePeriod be less than nodeStatusUpdateFrequency, since there
// will only be fresh values from Kubelet at an interval of nodeStatusUpdateFrequency.
// The constant must be less than podEvictionTimeout.
// 2. nodeMonitorGracePeriod can t be too large for user experience - larger value takes
// longer for user to see up-to-date node status.
nodeMonitorGracePeriod time.Duration
// Value controlling NodeController monitoring period, i.e. how often does NodeController
// check node status posted from kubelet. This value should be lower than nodeMonitorGracePeriod.
// TODO: Change node status monitor to watch based.
nodeMonitorPeriod time.Duration
// Value used if sync_nodes_status=False, only for node startup. When node
// is just created, e.g. cluster bootstrap or node creation, we give a longer grace period.
nodeStartupGracePeriod time.Duration
// per Node map storing last observed Status together with a local time when it was observed.
// This timestamp is to be used instead of LastProbeTime stored in Condition. We do this
// to aviod the problem with time skew across the cluster.
nodeStatusMap map[string]nodeStatusData
now func() metav1.Time
// Lock to access evictor workers
evictorLock sync.Mutex
// workers that evicts pods from unresponsive nodes.
zonePodEvictor map[string]*RateLimitedTimedQueue
// workers that are responsible for tainting nodes.
zoneNotReadyOrUnreachableTainer map[string]*RateLimitedTimedQueue
podEvictionTimeout time.Duration
// The maximum duration before a pod evicted from a node can be forcefully terminated.
maximumGracePeriod time.Duration
recorder record.EventRecorder
nodeLister corelisters.NodeLister
nodeInformerSynced cache.InformerSynced
daemonSetStore extensionslisters.DaemonSetLister
daemonSetInformerSynced cache.InformerSynced
podInformerSynced cache.InformerSynced
// allocate/recycle CIDRs for node if allocateNodeCIDRs == true
cidrAllocator CIDRAllocator
// manages taints
taintManager *NoExecuteTaintManager
forcefullyDeletePod func(*v1.Pod) error
nodeExistsInCloudProvider func(types.NodeName) (bool, error)
computeZoneStateFunc func(nodeConditions []*v1.NodeCondition) (int, zoneState)
enterPartialDisruptionFunc func(nodeNum int) float32
enterFullDisruptionFunc func(nodeNum int) float32
zoneStates map[string]zoneState
evictionLimiterQPS float32
secondaryEvictionLimiterQPS float32
largeClusterThreshold int32
unhealthyZoneThreshold float32
// if set to true NodeController will start TaintManager that will evict Pods from
// tainted nodes, if they re not tolerated.
runTaintManager bool
// if set to true NodeController will taint Nodes with TaintNodeNotReady and TaintNodeUnreachable
// taints instead of evicting Pods itself.
useTaintBasedEvictions bool
}
NodeController 的行为配置
整个 NodeController 结构体非常复杂,包含 30+ 项,我们将重点关注:
clusterCIDR – 通过 –cluster-cidr 来设置,表示 CIDR Range for Pods in cluster。
serivceCIDR – 通过 –service-cluster-ip-range 来设置,表示 CIDR Range for Services in cluster。
knownNodeSet – 用来记录 NodeController observed 节点的集合。
nodeMonitorGracePeriod – 通过 –node-monitor-grace-period 来设置,默认为 40s,表示在标记某个 Node 为 unhealthy 前,允许 40s 内该 Node unresponsive。
nodeMonitorPeriod – 通过 –node-monitor-period 来设置,默认为 5s,表示在 NodeController 中同步 NodeStatus 的周期。
nodeStatusMap – 用来记录每个 Node 最近一次观察到的 Status。
zonePodEvictor – workers that evicts pods from unresponsive nodes.
zoneNotReadyOrUnreachableTainer – workers that are responsible for tainting nodes.
podEvictionTimeout – 通过 –pod-eviction-timeout 设置,默认为 5min,表示在强制删除 Pod 时,允许的最大的 Pod eviction 时间。
maximumGracePeriod – The maximum duration before a pod evicted from a node can be forcefully terminated. 不可配置,代码中写死为 5min。
nodeLister – 用来获取 Node 数据的 Interface。
daemonSetStore – 用来获取 daemonSet 数据的 Interface。在通过 Eviction 方式删除 Pods 时,会跳过该 Node 上所有的 daemonSet 对应的 Pods。
taintManager – 它是一个 NoExecuteTaintManager 对象,当 runTaintManager(默认 true) 为 true 时:
PodInformer 和 NodeInformer 将监听到 PodAdd,PodDelete,PodUpdate 和 NodeAdd,NodeDelete,NodeUpdate 事件后,
触发 TraintManager 执行对应的 NoExecuteTaintManager.PodUpdated 和 NoExecuteTaintManager.NodeUpdated 方法,
将事件加入到对应的 queue(podUpdateQueue and nodeUpdateQueue),TaintController 会从这些 queue 中消费这些消息,
TaintController 分别调用 handlePodUpdate 和 handleNodeUpdate 处理。
具体的 TaintController 的处理逻辑,后续再单独分析。
forcefullyDeletePod – 该方法用来 NodeController 调用 apiserver 接口强制删除该 Pod。用来删除那些被调度到 kubelet version 小于 v1.1.0 Node 上的 Pod,因为 kubelet v1.1.0 之前的版本不支持 graceful termination。
computeZoneStateFunc – 该方法返回 Zone 中 NotReadyNodes 数量以及该 Zone 的 state。
如果没有一个 Ready Node,则该 node state 为 FullDisruption;
如果 unhealthy Nodes 所占的比例大于等于 unhealthyZoneThreshold, 则该 node state 为 PartialDisruption;
否则该 node state 就是 Narmal。
enterPartialDisruptionFunc – 该方法用当前 node num 对比 largeClusterThreshold:
如果 nodeNum largeClusterThreshold 则返回 secondaryEvictionLimiterQPS(默认为 0.01);
否则返回 0,表示停止 evict 操作。
enterFullDisruptionFunc – 用来获取 evictionLimiterQPS(默认为 0.1)的方法,关于 evictionLimiterQPS 的理解见下。
zoneStates – 表示各个 zone 的状态,状态值可以为
Initial;
Normal;
FullDisruption;
PartialDisruption;
evictionLimiterQPS – 通过 –node-eviction-rate 设置,默认为 0.1,表示当某个 Zone status 为 healthy 时,每秒应该剔除的 Nodes 数量,即每 10s 剔除 1 个 Node。
secondaryEvictionLimiterQPS – 通过 –secondary-node-eviction-rate 设置,默认为 0.01,表示当某个 Zone status 为 unhealthy 时,每秒应该剔除的 Nodes 数量,即每 100s 剔除 1 个 Node。
largeClusterThreshold – 通过 –large-cluster-size-threshold 设置,默认为 50,表示当健康 nodes 组成的集群规模小于等于 50 时,secondary-node-eviction-rate 将被设置为 0。
unhealthyZoneThreshold – 通过 –unhealthy-zone-threshold 设置,默认为 0.55,表示当某个 Zone 中 unhealthy Nodes(最少为 3)所占的比例达到 0.55 时,就认为该 Zone 的状态为 unhealthy。
runTaintManager – 在 –enable-taint-manager 中指定,默认为 true。如果为 true,则表示 NodeController 将会启动 TaintManager,由 TaintManager 负责将不能容忍该 Taint 的 Nodes 上的 Pods 进行 evict 操作。
useTaintBasedEvictions – 在 –feature-gates 中指定,默认 TaintBasedEvictions=false, 仍属于 Alpha 特性。如果为 true,则表示将通过 Taint Nodes 的方式来 Evict Pods。
“Kubernetes Node Controller 怎么启动”的内容就介绍到这里了,感谢大家的阅读。如果想了解更多行业相关的知识可以关注丸趣 TV 网站,丸趣 TV 小编将为大家输出更多高质量的实用文章!