Kubernetes中Node异常时Pod状态是怎样的

120次阅读

共计 5293 个字符，预计需要花费 14 分钟才能阅读完成。

这篇文章主要讲解了“Kubernetes 中 Node 异常时 Pod 状态是怎样的”，文中的讲解内容简单清晰，易于学习与理解，下面请大家跟着丸趣 TV 小编的思路慢慢深入，一起来研究和学习“Kubernetes 中 Node 异常时 Pod 状态是怎样的”吧！

Kubelet 进程异常，Pod 状态变化

一个节点上运行着 pod 前提下，这个时候把 kubelet 进程停掉。里面的 pod 会被干掉吗？会在其他节点 recreate 吗？

结论：

（1）Node 状态变为 NotReady（2）Pod 5 分钟之内状态无变化，5 分钟之后的状态变化：Daemonset 的 Pod 状态变为 Nodelost，Deployment、Statefulset 和 Static Pod 的状态先变为 NodeLost，然后马上变为 Unknown。Deployment 的 pod 会 recreate，但是 Deployment 如果是 node selector 停掉 kubelet 的 node，则 recreate 的 pod 会一直处于 Pending 的状态。Static Pod 和 Statefulset 的 Pod 会一直处于 Unknown 状态。

Kubelet 恢复，Pod 行为

如果 kubelet 10 分钟后又起来了，node 和 pod 会怎样？

结论：

（1）Node 状态变为 Ready。（2）Daemonset 的 pod 不会 recreate，旧 pod 状态直接变为 Running。（3）Deployment 的则是将 kubelet 进程停止的 Node 删除（原因可能是因为旧 Pod 状态在集群中有变化，但是 Pod 状态在变化时发现集群中 Deployment 的 Pod 实例数已经够了，所以对旧 Pod 做了删除处理）（4）Statefulset 的 Pod 会重新 recreate。（5）Staic Pod 没有重启，但是 Pod 的运行时间会在 kubelet 起来的时候置为 0。

在 kubelet 停止后，statefulset 的 pod 会变成 nodelost，接着就变成 unknown，但是不会重启，然后等 kubelet 起来后，statefulset 的 pod 才会 recreate。

还有一个就是 Static Pod 在 kubelet 重启以后应该没有重启，但是集群中查询 Static Pod 的状态时，Static Pod 的运行时间变了

StatefulSet Pod 为何在 Node 异常时没有 Recreate

Node down 后，StatefulSet Pods 並沒有重建，為什麼？

我们在 node controller 中发现，除了 daemonset pods 外，都会调用 delete pod api 删除 pod。

但并不是调用了 delete pod api 就会从 apiserver/etcd 中删除 pod object，仅仅是设置 pod 的 deletionTimestamp，标记该 pod 要被删除。真正删除 Pod 的行为是 kubelet，kubelet grace terminate 该 pod 后去真正删除 pod object。这个时候 statefulset controller 发现某个 replica 缺失就会去 recreate 这个 pod。

但此时由于 kubelet 挂了，无法与 master 通信，导致 Pod Object 一直无法从 etcd 中删除。如果能成功删除 Pod Object，就可以在其他 Node 重建 Pod。

另外，要注意，statefulset 只会针对 isFailed Pod，（但现在 Pods 是 Unkown 状态）才会去 delete Pod。

// delete and recreate failed pods
 if isFailed(replicas[I]) {
 ssc.recorder.Eventf(set, v1.EventTypeWarning,  RecreatingFailedPod ,
 StatefulSetPlus %s/%s is recreating failed Pod %s ,
 set.Namespace,
 set.Name,
 replicas[I].Name)
 if err := ssc.podControl.DeleteStatefulPlusPod(set, replicas[I]); err != nil {
 return  status, err
 if getPodRevision(replicas[I]) == currentRevision.Name {
 status.CurrentReplicas—
 if getPodRevision(replicas[I]) == updateRevision.Name {
 status.UpdatedReplicas—
 status.Replicas—
 replicas[I] = newVersionedStatefulSetPlusPod(
 currentSet,
 updateSet,
 currentRevision.Name,
 updateRevision.Name,
 }

优化 StatefulSet Pod 的行为

所以针对 node 异常的情况，有状态应用 (Non-Quorum) 的保障，应该补充以下行为：

监测 node 的网络、kubelet 进程、操作系统等是否异常，区别对待。

比如，如果是网络异常，Pod 无法正常提供服务，那么需要 kubectl delete pod -f —grace-period= 0 进行强制从 etcd 中删除该 pod。

强制删除后，statefulset controller 就会自动触发在其他 Node 上 recreate pod。

亦或者，更粗暴的方法，就是放弃 GracePeriodSeconds，StatefulSet Pod GracePeriodSeconds 为 nil 或者 0，则就会直接从 etcd 中删除该 object。

// BeforeDelete tests whether the object can be gracefully deleted.
// If graceful is set, the object should be gracefully deleted. If gracefulPending
// is set, the object has already been gracefully deleted (and the provided grace
// period is longer than the time to deletion). An error is returned if the
// condition cannot be checked or the gracePeriodSeconds is invalid. The options
// argument may be updated with default values if graceful is true. Second place
// where we set deletionTimestamp is pkg/registry/generic/registry/store.go.
// This function is responsible for setting deletionTimestamp during gracefulDeletion,
// other one for cascading deletions.
func BeforeDelete(strategy RESTDeleteStrategy, ctx context.Context, obj runtime.Object, options *metav1.DeleteOptions) (graceful, gracefulPending bool, err error) {objectMeta, gvk, kerr := objectMetaAndKind(strategy, obj)
 if kerr != nil {
 return false, false, kerr
 if errs := validation.ValidateDeleteOptions(options); len(errs)   0 {return false, false, errors.NewInvalid(schema.GroupKind{Group: metav1.GroupName, Kind:  DeleteOptions},  , errs)
 // Checking the Preconditions here to fail early. They ll be enforced later on when we actually do the deletion, too.
 if options.Preconditions != nil   options.Preconditions.UID != nil   *options.Preconditions.UID != objectMeta.GetUID() {return false, false, errors.NewConflict(schema.GroupResource{Group: gvk.Group, Resource: gvk.Kind}, objectMeta.GetName(), fmt.Errorf( the UID in the precondition (%s) does not match the UID in record (%s). The object might have been deleted and then recreated , *options.Preconditions.UID, objectMeta.GetUID()))
 gracefulStrategy, ok := strategy.(RESTGracefulDeleteStrategy)
 if !ok {
 // If we re not deleting gracefully there s no point in updating Generation, as we won t update
 // the obcject before deleting it.
 return false, false, nil
 // if the object is already being deleted, no need to update generation.
 if objectMeta.GetDeletionTimestamp() != nil {
 // if we are already being deleted, we may only shorten the deletion grace period
 // this means the object was gracefully deleted previously but deletionGracePeriodSeconds was not set,
 // so we force deletion immediately
 // IMPORTANT:
 // The deletion operation happens in two phases.
 // 1. Update to set DeletionGracePeriodSeconds and DeletionTimestamp
 // 2. Delete the object from storage.
 // If the update succeeds, but the delete fails (network error, internal storage error, etc.),
 // a resource was previously left in a state that was non-recoverable. We
 // check if the existing stored resource has a grace period as 0 and if so
 // attempt to delete immediately in order to recover from this scenario.
 if objectMeta.GetDeletionGracePeriodSeconds() == nil || *objectMeta.GetDeletionGracePeriodSeconds() == 0 {return false, false, nil}

感谢各位的阅读，以上就是“Kubernetes 中 Node 异常时 Pod 状态是怎样的”的内容了，经过本文的学习后，相信大家对 Kubernetes 中 Node 异常时 Pod 状态是怎样的这一问题有了更深刻的体会，具体使用情况还需要大家实践验证。这里是丸趣 TV，丸趣 TV 小编将为大家推送更多相关知识点的文章，欢迎关注！

正文完