在K8S中,影响Pod调度策略的有哪些?

Kubernetes调度内幕:掌控Pod落地的九种武器

你以为Pod调度只是简单的随机分配?背后是调度器的精密计算! 本文将揭秘生产环境中影响Pod调度的核心因素,让你成为集群资源的绝对掌控者!


一、基础必修课:资源申请与限制(青铜段位)

1. 资源申请的黄金法则

resources:
  requests:
    cpu: "500m"   # 必须精确到毫核
    memory: "1Gi"  # 使用二进制单位(Gi/Mi)
  limits:
    cpu: "2"       # 不超过节点总核数
    memory: "4Gi"  # 必须大于requests

生产血泪教训

  • 未设置limits导致节点OOM被内核杀死进程
  • requests虚高造成资源碎片化

资源检查命令

kubectl describe nodes | grep -A5 Allocated
kubectl top pods --containers  # 实时监控真实用量

二、节点标签选择术(白银段位)

1. 硬匹配:NodeSelector

nodeSelector:
  disktype: ssd
  gpu: "true"

节点打标操作

kubectl label nodes node01 gpu=true
kubectl get nodes -l gpu=true  # 筛选节点

2. 智能调度:NodeAffinity

affinity:
  nodeAffinity:
    requiredDuringSchedulingIgnoredDuringExecution:  # 硬性要求
      nodeSelectorTerms:
      - matchExpressions:
        - key: topology.kubernetes.io/zone
          operator: In
          values: [zoneA]
    preferredDuringSchedulingIgnoredDuringExecution: # 软性偏好
    - weight: 80
      preference:
        matchExpressions:
        - key: env
          operator: In
          values: [prod]

三、隔离与共存策略(黄金段位)

1. 死对头:PodAntiAffinity

# 禁止同应用Pod部署到同一节点
podAntiAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: app
        operator: In
        values: [order-service]
    topologyKey: kubernetes.io/hostname

2. 好基友:PodAffinity

# 将缓存服务与数据库部署到同一区域
podAffinity:
  requiredDuringSchedulingIgnoredDuringExecution:
  - labelSelector:
      matchExpressions:
      - key: db
        operator: In
        values: [mysql]
    topologyKey: topology.kubernetes.io/zone

四、污点免疫系统(铂金段位)

1. 节点污点类型

污点效果 含义 典型场景
NoSchedule 禁止新Pod调度 专用GPU节点
PreferNoSchedule 尽量不调度 准备下线维护的节点
NoExecute 驱逐现有Pod 节点故障

节点打污点命令

kubectl taint nodes node01 special=true:NoSchedule

2. Pod容忍配置

tolerations:
- key: "special"
  operator: "Equal"
  value: "true"
  effect: "NoSchedule"
  tolerationSeconds: 3600  # 临时容忍时间

五、特权阶级:优先级与抢占(钻石段位)

1. 优先级分类

apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
  name: mission-critical
value: 1000000              # 数值越大优先级越高
preemptionPolicy: Never     # 是否允许抢占

2. Pod声明优先级

priorityClassName: mission-critical

注意事项

  • 慎用preemptionPolicy: PreemptLowerPriority
  • 优先级影响但不覆盖资源请求

六、存储拓扑约束(大师段位)

跨可用区卷绑定策略

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: topology-aware
provisioner: kubernetes.io/aws-ebs
volumeBindingMode: WaitForFirstConsumer  # 关键参数!
allowedTopologies:
- matchLabelExpressions:
  - key: topology.kubernetes.io/zone
    values: [us-west-2a, us-west-2b]

七、调度器扩展机制(宗师段位)

1. 自定义调度器

spec:
  schedulerName: my-custom-scheduler

2. 调度框架插件

// 示例:实现自定义过滤逻辑
func Filter(ctx context.Context, cycleState *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
  if nodeInfo.Node().Labels["security"] != pod.Labels["sec-level"] {
    return framework.NewStatus(framework.Unschedulable, "security level mismatch")
  }
  return nil
}

八、生产环境调度策略全景图

调度决策树:
资源请求 → 节点筛选 → 亲和性 → 污点容忍 → 优先级 → 最终绑定
              ↑          ↑          ↑
              │          │          └── 专用节点处理
              │          └─── 业务拓扑约束
              └─── 存储拓扑感知

九、调度问题排查宝典

1. 查看未调度Pod原因

kubectl describe pod [pod-name] | grep -A20 Events
kubectl get events --field-selector involvedObject.name=[pod-name]

2. 模拟调度过程

kubectl create -f pod.yaml --dry-run=server -o yaml | kubectl get -f - --dry-run=server -o jsonpath='{.spec.schedulerName}'

3. 调度器可视化工具


记住:调度策略是集群稳定性的基石! 合理运用这些技巧,你的Pod将像精确制导导弹一样命中目标节点!

posted on 2025-03-20 13:45  Leo-Yide  阅读(35)  评论(0)    收藏  举报