(K8s学习笔记六)Pod的调度

RC(ReplicationController)只能选择一个标签,RS(ReplicaSet)可选择多个标签,例如APPTest发布了v1和v2两个版本,并希望副本数为3,可同时包含v1和v2两个版本的Pod

selector:
matchLabels:
version: v2
matchExpressions:
- {key: version, operator: IN, values: [v1,v2]

1.Deployment或RC/RS:全自动调度

Deployment或RC/RS功能:自动完成一个容器应用的多份副本部署、版本更新、回滚,以及持续维持指定的副本数 

# nginx-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
   name: nginx-deployment
   namespace: test
spec:
   replicas: 3 # 这个RS创建三个副本
   template:
     metadata:
     labels:
       app: nginx
     spec:
       containers:
       - name: nginx
       image: nginx:latest
       ports:
       - containerPort: 80

2.nodeSelector:定向调度

将Pod调度到指定的Node上,可通过Lable和pod的nodeSelector属性

1.首先通过kubectl label命令给目标node打标签
kubectl label nodes <node-name> <label-key>=<label-value>
例子:
kubectl label nodes work01 zone=frontend    # 为work01节点打上了一个zone=frontend的标签,表明他时“frontend”节点

2.然后在Pod定义文件中加入nodeSelector设备
apiVersion: v1 
kind: ReplicaSet
metadata:
  name: nginx-test
  labels:
    name: nginx-test
spec:
  replicas: 1
  selector:
    name: nginx-test
  template:
    metadata:
      labels: nginx-test
    spec:
      conditions:
      - name: nginx
        image: nginx:latest
        port:
        - containerPort: 80
      nodeSelector:            # 调度到拥有zone=frontend签的node
        zone: frontend

3.nodeAffinity:Node亲和性调度

包含两种节点亲和性表达:

1)requiredDuringSchedulingIgnoredDuringExecution

必须满足指定规则才能调度Pod到Node上,是硬限制

2)preferredDuringSchedulingIgnoredDuringExecutionpr

调度Pod到Node上按指定规则的优先级,但不强求,是软限制,多个优先级可设置weight权重值,以定义执行的先后顺序

注:IgnoredDuringExecutionpr含有是,如果一个Pod所在的节点在Pod运行期间标签发生变更,不再符合该Pod的节点亲和性需求,则系统将忽略Node上的Label变化,该Pod继续在该节点运行

apiVersion: v1 
kind: Pod
metadata:
  name: busybox-test
spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: beta.kubernetes.io/arch
            operator: In
            values:
            - amd64
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: disk-type
            operator: In
            values:
            - sshd
  containers:
  - name: busybox-test
    image: busybox:latest

operator选项包括IN/NotIn/Exists/DoesNotExist/Gt/Lt运算关系
IN:label的值在某个列表中
NotIN:label的值不在某个列表中
Exists:某个label存在
DoesNotExist:某个label不存在
Gt:label的值大于某个值
Lt:label的值小于某个值
  • 如果同是定义nodeSelector和nodeAffinity则必须两个条件都满足才能调度到Node上
  • 如果nodeAffinity中有多个nodeSelectorTerms,则其中一个条件匹配即可调度Pod
  • 如果nodeSelectorTerms中有多个matchExpressions,则一个节点满足所有matchExpressions条件才能调度Pod

4.podAffinity:Pod亲和与互斥调度策略

亲和与互斥调度策略是通过X轴和Y轴定义的条件互相亲和或互斥进行调度Pod的,Node的标签定义在X轴,Pod匹配条件定义在Y轴

X轴定义值:

可用节点名、机架、区域概念定义node,这个值是topologyKey值,其值包括kubernetes.io/hostname、failure-domain.beta.kubernets.io/zone(通常是同一IDC的不同区域)和failure-domain.beta.kubernets.io/region(通常是不同位置的IDC)三个选项

Y轴定义值:

  • Pod亲和与互斥的条件设置值包括requiredDuringSchedulingIgnoredDuringExecution和preferredDuringSchedulingIgnoredDuringExecution两个
  • Pod间的亲和性在spec.affinity字段下的podAffinity中定义
  • Pod间的互斥性在spec.affinity字段下的podAntiAffinity中定义

例一,Pod亲和性调度(podAffinity)

创建第一个Pod,定义亲和条件,标签是security=A1,app=busybox
apiVersion: v1 
kind: Pod
metadata:
  name: busybox-one
  labels:
    security: "A1"
    app: "busybox"
spec:
  containers:
  - name: busybox-one
    image: busybox:latest

创建第二个Pod,定义亲和条件,标签是security=A1,topologKey值为"kubernetes.io/hostname"

apiVersion: v1 
kind: Pod
metadata:
  name: busybox-two
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
            - A1
        topologyKey: kubernetes.io/hostname
  containers:
  - name: busybox-two
    image: busybox:latest

使用kubectl get pods -o wide命令查看,两个Pod运行在同一个node上

例二,Pod互斥性调度(podAntiAffinity)

创建第一个Pod,定义标签是security=A1,app=busybox
apiVersion: v1 
kind: Pod
metadata:
  name: busybox-one
  labels:
    security: "A1"
    app: "busybox"
spec:
  containers:
  - name: busybox-test
    image: busybox:latest

创建第二个Pod,定义亲和标签是security=A1,互斥条件topologKey值为"failure-domain.beta.kubernets.io/zone"
apiVersion: v1 
kind: Pod
metadata:
  name: busybox-two
spec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: security
            operator: In
            values:
             - A1
      topologyKey: failure-domain.beta.kubernets.io/zone
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
      - labelSelector:
          matchExpressions:
          - key: app
            operator: In
            values:
            - bosybox
        topologyKey: kubernetes.io/hostname
  containers:
  - name: busybox-two
    image: busybox:latest

使用kubectl get pods -o wide命令查看,两个Pod运行在同一个zone里,但不在同一个node上

注:

  • requiredDuringSchedulingIgnoredDuringExecution中定义的topologyKey值不能为空
  • preferredDuringSchedulingIgnoredDuringExecution中定义的topologyKey可为空,但空值被解释为kubernetes.io/hostname、failure-domain.beta.kubernets.io/zone和failure-domain.beta.kubernets.io/region三者的组合
  • podAffinity亲和性也可使用namespace值,如果namespace设置为"",表示所有namespace

5.Taints和Tolerations(污点和容忍)

Taints需和Toleration配合使用,可让node拒绝pod的运行,可以一个node上设置一个或多个taint,除非pod声明能容忍这些污点,否则无法在这些node上运行,tolerations是pod的属性,让pod能运行在标注了taint的node上

例:kubectl taint命令设置work01的taint信息为不参与调度,Pod上声明tolerations可容忍work01的污点,并在其上运行

kubectl taint nodes work01 key=value:NoSchedule

key的value值可设置为NoSchedule/PreferNoSchedule/NoExecute
NoSchedule:调度器不会把pod调度到这个node,硬限制
PreferNoSchedule:调度器尝试不把pod调度到这个node,软限制
NoExecute:没有设置tolerations的pod被驱逐;配置了tolerations的pod,没有tolerationSeconds
则一只运行;配置了配置了tolerations的pod,且指定了tolerationSeconds在指定时间后驱逐; tolerations:
- key: "key" # 此值设置需与taint的key设置一致 operator: "Equal" # 与value值相等 value: "value" effect: "NoSchedule" # 此值设置需与taint的value设置一致 或者 tolerations: - key: "key" operator: "Exists" # 表示无须指定value effect: "NoSchedule" # 此值如设置为PreferNoSchedule,则表示软限制

如果不指定operator,则默认值为Equal 空的key配合Exists能够匹配所有的键和值 空的effect匹配所有的effect值

K8s处理多个Taint和Toleration的顺序:先列出节点中所有的Taint,然后忽略Pod的toleration能够匹配的部分,剩下未忽略的taint就对pod起限制效果

例:对一个node设置多个taint,在一个pod里设置多个toleration

kubectl taint nodes work01 key1=value1:NoSchedule
kubectl taint nodes work01 key1=value1:NoExecute
kubectl taint nodes work01 key1=value2:NoSchedule

tolerations:
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule"
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoExecute"
  tolerationSeconds: 3600   # pod可以在taint添加到node后还能在这个node上运行3600s后被驱逐

如果该pod已经在该node上运行时设置第3个taint,它不会被驱逐,因为pod可容忍前两个taint

1)如果想拿出一部分节点专供特定应用使用,可将这些节点设置为独立节点

kubectl taint nodes <nodename> dedicated=<groupname>:NoSchedule

然后把这些应用的pod加入对应的toleration,如此有合适toleration的pod就会被允许使用该有taint设置的节点

2)将对有特定硬件需求的pod调度到有特殊硬件节点

kubectl taint nodes <nodename> special=true:NoSchedule
kubectl taint nodes <nodename> special=true:PreferNoSchedule

然后在需要特定硬件的pod加入对应的toleration,如此有合适toleration的pod就会被允许使用有该taint设置的节点

6.Pod Priority Preemption:Pod优先级调度

 

抢占调度策略分为:Eviction(驱逐)和Preemption(抢占)

Eviction:是kubelet执行的行为,当一个node发生资源不足时,该节点的kubelet进程会根据Pod优先级、资源申请量和实际使用量等信息决定驱逐哪些pod,pod优先级相同时,资源占用量最大的pod会被首先驱逐。

Preemption:是scheduler执行的行为,当一个新pod因资源无法满足而不能不调度时,scheduler会优先驱逐低优先级的pod。

示例:

首先定义PriorithClass,它不属于任何namespace
apiVersion: scheduling.k8s.io/v1beta1
kind: PriorithClass
metadata:
  name: high-priority     # 优先级类别
value: 50000              # 数字越大优先级越高
globalDefault: false

在pod中引用优先级的类别
apiVersion: v1 
kind: Pod
metadata:
  name: busybox
  labels:
    env: test
spec:
  containers:
  - name: busybox
    image: busybox:latest
  priorityClassName: high-priority

注:高优先级的pod在调度过程中,初始预调度N节点上优先级低的pod在驱逐过程中,如果有新节点能满足高优先级pod的需求,就会把它调度到新节点上,不非得调度到初始预判的N节点上,如N节点在驱逐低优先级pod时出现了比预调度pod更高优先级的pod,则会优先调度优先级最高的pod

7.Job批处理调度

 

Job批处理任务分为三种工作模式:

a、Job Template Expansion模式:一个Job对象的对应一个批处理的work item(工作项)

首先定义一个Job模版job.yaml.txt
apiVersion: batch/v1
kind: Job
metadata:
  name: work-item-$ITEM
  labels:
    jobgroup: jobexample
spec:
  template:
    metadata:
      name: jobexample
      labels:
        jobgroup: jobexample
  spec:
    containers:
    - name: busybox
      image: busybox:latest
      command: ["sh", "-c", "echo the item $ITEM" && sleep 3"]
    restartPolicy: Never

生成3个对应的Job定义文件并创建Job
# for i in ont two three
> do
>    cat job.yaml.txt | sed "s\/$ITEM/$i/" >./jobs/job-$i.yaml
> done

# ls jobs
job-one.yaml job-two.yaml job-three.yaml

# kubectl create -f jobs
# kubectl get jobs -l jobgroup=jobexample

b、Quene with Pod Per Work Item模式:一个任务队列存放work item,一个job对象作为consumer去完成这些work item,Job会启动N个Pod,每个Pod对应一   个work item

c、Queue with Variable Pod Count模式:和Quene with Pod Per Work Item模式相似,但此模式Job启动的Pod数量是可变的

8.CronJob:定时任务

CronJob格式

Minuts Hours Day of Month Month Day of Week Year

示例:

# cron.yaml  创建一个名为hello的cronjob
apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/10 * * * *"      # 每隔10分钟执行一次
  jobTemplate:
    spec:
    template:
      spec:
        containers:
        - name: hello
          image: busybox:latest
          command: ["sh", "-c", "date;echo Hello from the K8s cluster"]
        restartPolicy: OnFailure

# 每隔10分钟执行以下命令查看任务状态
kubectl get cronjob hello
kubectl get jobs --watch

# 使用以下命令删除名为hello的cronjob
kubectl delete cronjob hello

 

 9.Init Container(初始化容器)

在启动容器之前做初始化操作,如:关联组件正确运行(数据库🔥某个后台服务);基于环境变量或配置模版生成配置文件;从远程数据库获取本地所需配置,或将自身注册到某个中央数据库中;下载依赖包,或对系统进行一些预配置操作。

示例:在启动Nginx容器前,通过初始化容器,使用busybox为nginx创建一个index.html的主页文件

apiVersion: v1 
kind: Pod
metadata:
  name: nginx
  annotations:
spec:
  initContainers:    # 使用busybox从百度首页下载一个index.html文件作为nginx初始化主页
  - name: CreateHtml
    image: busybox:latest
    command:
    - wget
    - "-O"
    - "/website/index.html"
    - https://www.baidu.com  
    volumeMounts:
    - name: website
      mountPath: "/website"
  containers:
  - name: nginx
    image: nginx:latest
    ports:
    - containerPort: 80
    volumeMounts:
    - name: website
      mountPath: /usr/share/nginx/html
    dnsPolicy: Default
    volumes:
    - name: website
      emptyDir: {}

 

  • 如果设置了多事init container将按顺序逐个运行,所有init container都运行完后,才会开始创建和运行应用容器
  • 在init container的定义中可以设置资源限制、Volume的使用和安全策略等,在多个init container都定义了资源限制时,则取最大的值作为所有init container的资源限制值
  • init container不能设置readinessProbe探针

 

posted @ 2020-07-12 00:54  不倒翁Jason  阅读(1136)  评论(0编辑  收藏  举报