在K8S中,影响Pod调度策略的有哪些?
Kubernetes调度内幕:掌控Pod落地的九种武器
你以为Pod调度只是简单的随机分配?背后是调度器的精密计算! 本文将揭秘生产环境中影响Pod调度的核心因素,让你成为集群资源的绝对掌控者!
一、基础必修课:资源申请与限制(青铜段位)
1. 资源申请的黄金法则
resources:
requests:
cpu: "500m" # 必须精确到毫核
memory: "1Gi" # 使用二进制单位(Gi/Mi)
limits:
cpu: "2" # 不超过节点总核数
memory: "4Gi" # 必须大于requests
生产血泪教训:
- 未设置limits导致节点OOM被内核杀死进程
- requests虚高造成资源碎片化
资源检查命令:
kubectl describe nodes | grep -A5 Allocated
kubectl top pods --containers # 实时监控真实用量
二、节点标签选择术(白银段位)
1. 硬匹配:NodeSelector
nodeSelector:
disktype: ssd
gpu: "true"
节点打标操作:
kubectl label nodes node01 gpu=true
kubectl get nodes -l gpu=true # 筛选节点
2. 智能调度:NodeAffinity
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution: # 硬性要求
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values: [zoneA]
preferredDuringSchedulingIgnoredDuringExecution: # 软性偏好
- weight: 80
preference:
matchExpressions:
- key: env
operator: In
values: [prod]
三、隔离与共存策略(黄金段位)
1. 死对头:PodAntiAffinity
# 禁止同应用Pod部署到同一节点
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values: [order-service]
topologyKey: kubernetes.io/hostname
2. 好基友:PodAffinity
# 将缓存服务与数据库部署到同一区域
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: db
operator: In
values: [mysql]
topologyKey: topology.kubernetes.io/zone
四、污点免疫系统(铂金段位)
1. 节点污点类型
| 污点效果 | 含义 | 典型场景 |
|---|---|---|
| NoSchedule | 禁止新Pod调度 | 专用GPU节点 |
| PreferNoSchedule | 尽量不调度 | 准备下线维护的节点 |
| NoExecute | 驱逐现有Pod | 节点故障 |
节点打污点命令:
kubectl taint nodes node01 special=true:NoSchedule
2. Pod容忍配置
tolerations:
- key: "special"
operator: "Equal"
value: "true"
effect: "NoSchedule"
tolerationSeconds: 3600 # 临时容忍时间
五、特权阶级:优先级与抢占(钻石段位)
1. 优先级分类
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: mission-critical
value: 1000000 # 数值越大优先级越高
preemptionPolicy: Never # 是否允许抢占
2. Pod声明优先级
priorityClassName: mission-critical
注意事项:
- 慎用preemptionPolicy: PreemptLowerPriority
- 优先级影响但不覆盖资源请求
六、存储拓扑约束(大师段位)
跨可用区卷绑定策略:
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: topology-aware
provisioner: kubernetes.io/aws-ebs
volumeBindingMode: WaitForFirstConsumer # 关键参数!
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/zone
values: [us-west-2a, us-west-2b]
七、调度器扩展机制(宗师段位)
1. 自定义调度器:
spec:
schedulerName: my-custom-scheduler
2. 调度框架插件:
// 示例:实现自定义过滤逻辑
func Filter(ctx context.Context, cycleState *framework.CycleState, pod *v1.Pod, nodeInfo *framework.NodeInfo) *framework.Status {
if nodeInfo.Node().Labels["security"] != pod.Labels["sec-level"] {
return framework.NewStatus(framework.Unschedulable, "security level mismatch")
}
return nil
}
八、生产环境调度策略全景图
调度决策树:
资源请求 → 节点筛选 → 亲和性 → 污点容忍 → 优先级 → 最终绑定
↑ ↑ ↑
│ │ └── 专用节点处理
│ └─── 业务拓扑约束
└─── 存储拓扑感知
九、调度问题排查宝典
1. 查看未调度Pod原因:
kubectl describe pod [pod-name] | grep -A20 Events
kubectl get events --field-selector involvedObject.name=[pod-name]
2. 模拟调度过程:
kubectl create -f pod.yaml --dry-run=server -o yaml | kubectl get -f - --dry-run=server -o jsonpath='{.spec.schedulerName}'
3. 调度器可视化工具:
记住:调度策略是集群稳定性的基石! 合理运用这些技巧,你的Pod将像精确制导导弹一样命中目标节点!
浙公网安备 33010602011771号