Predicates Policies有什么用

223次阅读

没有评论

共计 5525 个字符，预计需要花费 14 分钟才能阅读完成。

本篇内容介绍了“Predicates Policies 有什么用”的有关知识，在实际案例的操作过程中，不少人都会遇到这样的困境，接下来就让丸趣 TV 小编带领大家学习一下如何处理这些情况吧！希望大家仔细阅读，能够学有所成！

##Predicates Policies 分析在 /plugin/pkg/scheduler/algorithm/predicates.go 中实现了以下的预选策略：

NoDiskConflict：检查在此主机上是否存在卷冲突。如果这个主机已经挂载了卷，其它同样使用这个卷的 Pod 不能调度到这个主机上。GCE,Amazon EBS, and Ceph RBD 使用的规则如下：

GCE 允许同时挂载多个卷，只要这些卷都是只读的。

Amazon EBS 不允许不同的 Pod 挂载同一个卷。

Ceph RBD 不允许任何两个 pods 分享相同的 monitor，match pool 和 image。

NoVolumeZoneConflict：检查给定的 zone 限制前提下，检查如果在此主机上部署 Pod 是否存在卷冲突。假定一些 volumes 可能有 zone 调度约束，VolumeZonePredicate 根据 volumes 自身需求来评估 pod 是否满足条件。必要条件就是任何 volumes 的 zone-labels 必须与节点上的 zone-labels 完全匹配。节点上可以有多个 zone-labels 的约束（比如一个假设的复制卷可能会允许进行区域范围内的访问）。目前，这个只对 PersistentVolumeClaims 支持，而且只在 PersistentVolume 的范围内查找标签。处理在 Pod 的属性中定义的 volumes（即不使用 PersistentVolume）有可能会变得更加困难，因为要在调度的过程中确定 volume 的 zone，这很有可能会需要调用云提供商。

PodFitsResources：检查主机的资源是否满足 Pod 的需求。根据实际已经分配的资源量做调度，而不是使用已实际使用的资源量做调度。

PodFitsHostPorts：检查 Pod 内每一个容器所需的 HostPort 是否已被其它容器占用。如果有所需的 HostPort 不满足需求，那么 Pod 不能调度到这个主机上。

HostName：检查主机名称是不是 Pod 指定的 HostName。

MatchNodeSelector：检查主机的标签是否满足 Pod 的 nodeSelector 属性需求。

MaxEBSVolumeCount：确保已挂载的 EBS 存储卷不超过设置的最大值。默认值是 39。它会检查直接使用的存储卷，和间接使用这种类型存储的 PVC。计算不同卷的总目，如果新的 Pod 部署上去后卷的数目会超过设置的最大值，那么 Pod 不能调度到这个主机上。

MaxGCEPDVolumeCount：确保已挂载的 GCE 存储卷不超过设置的最大值。默认值是 16。规则同上。

下面是 NoDiskConflict 的代码实现，其他 Predicates Policies 实现类似，都得如下函数原型：type FitPredicate func(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (bool, []PredicateFailureReason, error)

func NoDiskConflict(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (bool, []algorithm.PredicateFailureReason, error) {
 for _, v := range pod.Spec.Volumes {for _, ev := range nodeInfo.Pods() {if isVolumeConflict(v, ev) {return false, []algorithm.PredicateFailureReason{ErrDiskConflict}, nil
 return true, nil, nil

func isVolumeConflict(volume v1.Volume, pod *v1.Pod) bool {
 // fast path if there is no conflict checking targets.
 if volume.GCEPersistentDisk == nil   volume.AWSElasticBlockStore == nil   volume.RBD == nil   volume.ISCSI == nil {
 return false
 for _, existingVolume := range pod.Spec.Volumes {
 if volume.RBD != nil   existingVolume.RBD != nil {
 mon, pool, image := volume.RBD.CephMonitors, volume.RBD.RBDPool, volume.RBD.RBDImage
 emon, epool, eimage := existingVolume.RBD.CephMonitors, existingVolume.RBD.RBDPool, existingVolume.RBD.RBDImage
 // two RBDs images are the same if they share the same Ceph monitor, are in the same RADOS Pool, and have the same image name
 // only one read-write mount is permitted for the same RBD image.
 // same RBD image mounted by multiple Pods conflicts unless all Pods mount the image read-only
 if haveSame(mon, emon)   pool == epool   image == eimage   !(volume.RBD.ReadOnly   existingVolume.RBD.ReadOnly) {
 return true
 return false
}

##Priorities Policies 分析

现在支持的优先级函数包括以下几种：

LeastRequestedPriority：如果新的 pod 要分配给一个节点，这个节点的优先级就由节点空闲的那部分与总容量的比值（即（总容量 - 节点上 pod 的容量总和 - 新 pod 的容量）/ 总容量）来决定。CPU 和 memory 权重相当，比值最大的节点的得分最高。需要注意的是，这个优先级函数起到了按照资源消耗来跨节点分配 pods 的作用。计算公式如下：cpu((capacity – sum(requested)) * 10 / capacity) + memory((capacity – sum(requested)) * 10 / capacity) / 2

BalancedResourceAllocation：尽量选择在部署 Pod 后各项资源更均衡的机器。BalancedResourceAllocation 不能单独使用，而且必须和 LeastRequestedPriority 同时使用，它分别计算主机上的 cpu 和 memory 的比重，主机的分值由 cpu 比重和 memory 比重的“距离”决定。计算公式如下：score = 10 – abs(cpuFraction-memoryFraction)*10

SelectorSpreadPriority：对于属于同一个 service、replication controller 的 Pod，尽量分散在不同的主机上。如果指定了区域，则会尽量把 Pod 分散在不同区域的不同主机上。调度一个 Pod 的时候，先查找 Pod 对于的 service 或者 replication controller，然后查找 service 或 replication controller 中已存在的 Pod，主机上运行的已存在的 Pod 越少，主机的打分越高。

CalculateAntiAffinityPriority：对于属于同一个 service 的 Pod，尽量分散在不同的具有指定标签的主机上。

ImageLocalityPriority：根据主机上是否已具备 Pod 运行的环境来打分。ImageLocalityPriority 会判断主机上是否已存在 Pod 运行所需的镜像，根据已有镜像的大小返回一个 0 -10 的打分。如果主机上不存在 Pod 所需的镜像，返回 0；如果主机上存在部分所需镜像，则根据这些镜像的大小来决定分值，镜像越大，打分就越高。

NodeAffinityPriority（Kubernetes1.2 实验中的新特性）：Kubernetes 调度中的亲和性机制。Node Selectors（调度时将 pod 限定在指定节点上），支持多种操作符（In, NotIn, Exists, DoesNotExist, Gt, Lt），而不限于对节点 labels 的精确匹配。另外，Kubernetes 支持两种类型的选择器，一种是“hard（requiredDuringSchedulingIgnoredDuringExecution）”选择器，它保证所选的主机必须满足所有 Pod 对主机的规则要求。这种选择器更像是之前的 nodeselector，在 nodeselector 的基础上增加了更合适的表现语法。另一种是“soft（preferresDuringSchedulingIgnoredDuringExecution）”选择器，它作为对调度器的提示，调度器会尽量但不保证满足 NodeSelector 的所有要求。

下面是 ImageLocalityPriority 的代码实现，其他 Priorities Policies 实现类似，都得如下函数原型：type PriorityMapFunction func(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error)

func ImageLocalityPriorityMap(pod *v1.Pod, meta interface{}, nodeInfo *schedulercache.NodeInfo) (schedulerapi.HostPriority, error) {node := nodeInfo.Node()
 if node == nil {return schedulerapi.HostPriority{}, fmt.Errorf(node not found)
 var sumSize int64
 for i := range pod.Spec.Containers {sumSize += checkContainerImageOnNode(node,  pod.Spec.Containers[i])
 return schedulerapi.HostPriority{
 Host: node.Name,
 Score: calculateScoreFromSize(sumSize),
 }, nil
func calculateScoreFromSize(sumSize int64) int {
 var score int
 switch {
 case sumSize == 0 || sumSize   minImgSize:
 // score == 0 means none of the images required by this pod are present on this
 // node or the total size of the images present is too small to be taken into further consideration.
 score = 0
 // If existing images  total size is larger than max, just make it highest priority.
 case sumSize  = maxImgSize:
 score = 10
 default:
 score = int((10 * (sumSize - minImgSize) / (maxImgSize - minImgSize)) + 1)
 // Return which bucket the given size belongs to
 return score
}

其计算每个 Node 的 Score 算法为：score = int((10 * (sumSize – minImgSize) / (maxImgSize – minImgSize)) + 1)

其中：minImgSize int64 = 23 * mb, maxImgSize int64 = 1000 * mb, sumSize 为 Pod 中定义的 container Images size 的总和。

可见，Node 上该 Pod 要求的容器镜像大小之和越大，得分越高，越有可能是目标 Node。

“Predicates Policies 有什么用”的内容就介绍到这里了，感谢大家的阅读。如果想了解更多行业相关的知识可以关注丸趣 TV 网站，丸趣 TV 小编将为大家输出更多高质量的实用文章！

正文完