(K8s学习笔记六)Pod的调度
RC(ReplicationController)只能选择一个标签,RS(ReplicaSet)可选择多个标签,例如APPTest发布了v1和v2两个版本,并希望副本数为3,可同时包含v1和v2两个版本的Pod
selector:
matchLabels:
version: v2
matchExpressions:
- {key: version, operator: IN, values: [v1,v2]
1.Deployment或RC/RS:全自动调度
Deployment或RC/RS功能:自动完成一个容器应用的多份副本部署、版本更新、回滚,以及持续维持指定的副本数
# nginx-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: nginx-deployment namespace: test spec: replicas: 3 # 这个RS创建三个副本 template: metadata: labels: app: nginx spec: containers: - name: nginx image: nginx:latest ports: - containerPort: 80
2.nodeSelector:定向调度
将Pod调度到指定的Node上,可通过Lable和pod的nodeSelector属性
1.首先通过kubectl label命令给目标node打标签 kubectl label nodes <node-name> <label-key>=<label-value> 例子: kubectl label nodes work01 zone=frontend # 为work01节点打上了一个zone=frontend的标签,表明他时“frontend”节点 2.然后在Pod定义文件中加入nodeSelector设备 apiVersion: v1 kind: ReplicaSet metadata: name: nginx-test labels: name: nginx-test spec: replicas: 1 selector: name: nginx-test template: metadata: labels: nginx-test spec: conditions: - name: nginx image: nginx:latest port: - containerPort: 80 nodeSelector: # 调度到拥有zone=frontend签的node zone: frontend
3.nodeAffinity:Node亲和性调度
包含两种节点亲和性表达:
1)requiredDuringSchedulingIgnoredDuringExecution
必须满足指定规则才能调度Pod到Node上,是硬限制
2)preferredDuringSchedulingIgnoredDuringExecutionpr
调度Pod到Node上按指定规则的优先级,但不强求,是软限制,多个优先级可设置weight权重值,以定义执行的先后顺序
注:IgnoredDuringExecutionpr含有是,如果一个Pod所在的节点在Pod运行期间标签发生变更,不再符合该Pod的节点亲和性需求,则系统将忽略Node上的Label变化,该Pod继续在该节点运行
apiVersion: v1 kind: Pod metadata: name: busybox-test spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: beta.kubernetes.io/arch operator: In values: - amd64 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: disk-type operator: In values: - sshd containers: - name: busybox-test image: busybox:latest operator选项包括IN/NotIn/Exists/DoesNotExist/Gt/Lt运算关系
IN:label的值在某个列表中
NotIN:label的值不在某个列表中
Exists:某个label存在
DoesNotExist:某个label不存在
Gt:label的值大于某个值
Lt:label的值小于某个值
- 如果同是定义nodeSelector和nodeAffinity则必须两个条件都满足才能调度到Node上
- 如果nodeAffinity中有多个nodeSelectorTerms,则其中一个条件匹配即可调度Pod
- 如果nodeSelectorTerms中有多个matchExpressions,则一个节点满足所有matchExpressions条件才能调度Pod
4.podAffinity:Pod亲和与互斥调度策略
亲和与互斥调度策略是通过X轴和Y轴定义的条件互相亲和或互斥进行调度Pod的,Node的标签定义在X轴,Pod匹配条件定义在Y轴
X轴定义值:
可用节点名、机架、区域概念定义node,这个值是topologyKey值,其值包括kubernetes.io/hostname、failure-domain.beta.kubernets.io/zone(通常是同一IDC的不同区域)和failure-domain.beta.kubernets.io/region(通常是不同位置的IDC)三个选项
Y轴定义值:
- Pod亲和与互斥的条件设置值包括requiredDuringSchedulingIgnoredDuringExecution和preferredDuringSchedulingIgnoredDuringExecution两个
- Pod间的亲和性在spec.affinity字段下的podAffinity中定义
- Pod间的互斥性在spec.affinity字段下的podAntiAffinity中定义
例一,Pod亲和性调度(podAffinity)
创建第一个Pod,定义亲和条件,标签是security=A1,app=busybox apiVersion: v1 kind: Pod metadata: name: busybox-one labels: security: "A1" app: "busybox" spec: containers: - name: busybox-one image: busybox:latest 创建第二个Pod,定义亲和条件,标签是security=A1,topologKey值为"kubernetes.io/hostname" apiVersion: v1 kind: Pod metadata: name: busybox-two spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - A1 topologyKey: kubernetes.io/hostname containers: - name: busybox-two image: busybox:latest
使用kubectl get pods -o wide命令查看,两个Pod运行在同一个node上
例二,Pod互斥性调度(podAntiAffinity)
创建第一个Pod,定义标签是security=A1,app=busybox apiVersion: v1 kind: Pod metadata: name: busybox-one labels: security: "A1" app: "busybox" spec: containers: - name: busybox-test image: busybox:latest 创建第二个Pod,定义亲和标签是security=A1,互斥条件topologKey值为"failure-domain.beta.kubernets.io/zone" apiVersion: v1 kind: Pod metadata: name: busybox-two spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - A1 topologyKey: failure-domain.beta.kubernets.io/zone podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - bosybox topologyKey: kubernetes.io/hostname containers: - name: busybox-two image: busybox:latest
使用kubectl get pods -o wide命令查看,两个Pod运行在同一个zone里,但不在同一个node上
注:
- requiredDuringSchedulingIgnoredDuringExecution中定义的topologyKey值不能为空
- preferredDuringSchedulingIgnoredDuringExecution中定义的topologyKey可为空,但空值被解释为kubernetes.io/hostname、failure-domain.beta.kubernets.io/zone和failure-domain.beta.kubernets.io/region三者的组合
- podAffinity亲和性也可使用namespace值,如果namespace设置为"",表示所有namespace
5.Taints和Tolerations(污点和容忍)
Taints需和Toleration配合使用,可让node拒绝pod的运行,可以一个node上设置一个或多个taint,除非pod声明能容忍这些污点,否则无法在这些node上运行,tolerations是pod的属性,让pod能运行在标注了taint的node上
例:kubectl taint命令设置work01的taint信息为不参与调度,Pod上声明tolerations可容忍work01的污点,并在其上运行
kubectl taint nodes work01 key=value:NoSchedule key的value值可设置为NoSchedule/PreferNoSchedule/NoExecute NoSchedule:调度器不会把pod调度到这个node,硬限制 PreferNoSchedule:调度器尝试不把pod调度到这个node,软限制 NoExecute:没有设置tolerations的pod被驱逐;配置了tolerations的pod,没有tolerationSeconds
则一只运行;配置了配置了tolerations的pod,且指定了tolerationSeconds在指定时间后驱逐; tolerations: - key: "key" # 此值设置需与taint的key设置一致 operator: "Equal" # 与value值相等 value: "value" effect: "NoSchedule" # 此值设置需与taint的value设置一致 或者 tolerations: - key: "key" operator: "Exists" # 表示无须指定value effect: "NoSchedule" # 此值如设置为PreferNoSchedule,则表示软限制
如果不指定operator,则默认值为Equal 空的key配合Exists能够匹配所有的键和值 空的effect匹配所有的effect值
K8s处理多个Taint和Toleration的顺序:先列出节点中所有的Taint,然后忽略Pod的toleration能够匹配的部分,剩下未忽略的taint就对pod起限制效果
例:对一个node设置多个taint,在一个pod里设置多个toleration
kubectl taint nodes work01 key1=value1:NoSchedule kubectl taint nodes work01 key1=value1:NoExecute kubectl taint nodes work01 key1=value2:NoSchedule tolerations: - key: "key1" operator: "Equal" value: "value1" effect: "NoSchedule" - key: "key1" operator: "Equal" value: "value1" effect: "NoExecute" tolerationSeconds: 3600 # pod可以在taint添加到node后还能在这个node上运行3600s后被驱逐
如果该pod已经在该node上运行时设置第3个taint,它不会被驱逐,因为pod可容忍前两个taint
1)如果想拿出一部分节点专供特定应用使用,可将这些节点设置为独立节点
kubectl taint nodes <nodename> dedicated=<groupname>:NoSchedule
然后把这些应用的pod加入对应的toleration,如此有合适toleration的pod就会被允许使用该有taint设置的节点
2)将对有特定硬件需求的pod调度到有特殊硬件节点
kubectl taint nodes <nodename> special=true:NoSchedule kubectl taint nodes <nodename> special=true:PreferNoSchedule
然后在需要特定硬件的pod加入对应的toleration,如此有合适toleration的pod就会被允许使用有该taint设置的节点
6.Pod Priority Preemption:Pod优先级调度
抢占调度策略分为:Eviction(驱逐)和Preemption(抢占)
Eviction:是kubelet执行的行为,当一个node发生资源不足时,该节点的kubelet进程会根据Pod优先级、资源申请量和实际使用量等信息决定驱逐哪些pod,pod优先级相同时,资源占用量最大的pod会被首先驱逐。
Preemption:是scheduler执行的行为,当一个新pod因资源无法满足而不能不调度时,scheduler会优先驱逐低优先级的pod。
示例:
首先定义PriorithClass,它不属于任何namespace apiVersion: scheduling.k8s.io/v1beta1 kind: PriorithClass metadata: name: high-priority # 优先级类别 value: 50000 # 数字越大优先级越高 globalDefault: false 在pod中引用优先级的类别 apiVersion: v1 kind: Pod metadata: name: busybox labels: env: test spec: containers: - name: busybox image: busybox:latest priorityClassName: high-priority
注:高优先级的pod在调度过程中,初始预调度N节点上优先级低的pod在驱逐过程中,如果有新节点能满足高优先级pod的需求,就会把它调度到新节点上,不非得调度到初始预判的N节点上,如N节点在驱逐低优先级pod时出现了比预调度pod更高优先级的pod,则会优先调度优先级最高的pod
7.Job批处理调度
Job批处理任务分为三种工作模式:
a、Job Template Expansion模式:一个Job对象的对应一个批处理的work item(工作项)
首先定义一个Job模版job.yaml.txt apiVersion: batch/v1 kind: Job metadata: name: work-item-$ITEM labels: jobgroup: jobexample spec: template: metadata: name: jobexample labels: jobgroup: jobexample spec: containers: - name: busybox image: busybox:latest command: ["sh", "-c", "echo the item $ITEM" && sleep 3"] restartPolicy: Never 生成3个对应的Job定义文件并创建Job # for i in ont two three > do > cat job.yaml.txt | sed "s\/$ITEM/$i/" >./jobs/job-$i.yaml > done # ls jobs job-one.yaml job-two.yaml job-three.yaml # kubectl create -f jobs # kubectl get jobs -l jobgroup=jobexample
b、Quene with Pod Per Work Item模式:一个任务队列存放work item,一个job对象作为consumer去完成这些work item,Job会启动N个Pod,每个Pod对应一 个work item
c、Queue with Variable Pod Count模式:和Quene with Pod Per Work Item模式相似,但此模式Job启动的Pod数量是可变的
8.CronJob:定时任务
CronJob格式
Minuts | Hours | Day of Month | Month | Day of Week | Year |
示例:
# cron.yaml 创建一个名为hello的cronjob apiVersion: batch/v1 kind: CronJob metadata: name: hello spec: schedule: "*/10 * * * *" # 每隔10分钟执行一次 jobTemplate: spec: template: spec: containers: - name: hello image: busybox:latest command: ["sh", "-c", "date;echo Hello from the K8s cluster"] restartPolicy: OnFailure # 每隔10分钟执行以下命令查看任务状态 kubectl get cronjob hello kubectl get jobs --watch # 使用以下命令删除名为hello的cronjob kubectl delete cronjob hello
9.Init Container(初始化容器)
在启动容器之前做初始化操作,如:关联组件正确运行(数据库🔥某个后台服务);基于环境变量或配置模版生成配置文件;从远程数据库获取本地所需配置,或将自身注册到某个中央数据库中;下载依赖包,或对系统进行一些预配置操作。
示例:在启动Nginx容器前,通过初始化容器,使用busybox为nginx创建一个index.html的主页文件
apiVersion: v1 kind: Pod metadata: name: nginx annotations: spec: initContainers: # 使用busybox从百度首页下载一个index.html文件作为nginx初始化主页 - name: CreateHtml image: busybox:latest command: - wget - "-O" - "/website/index.html" - https://www.baidu.com volumeMounts: - name: website mountPath: "/website" containers: - name: nginx image: nginx:latest ports: - containerPort: 80 volumeMounts: - name: website mountPath: /usr/share/nginx/html dnsPolicy: Default volumes: - name: website emptyDir: {}
- 如果设置了多事init container将按顺序逐个运行,所有init container都运行完后,才会开始创建和运行应用容器
- 在init container的定义中可以设置资源限制、Volume的使用和安全策略等,在多个init container都定义了资源限制时,则取最大的值作为所有init container的资源限制值
- init container不能设置readinessProbe探针