kubernetes集群系列资料14--scheduler介绍

一、scheduler介绍

　　scheduler主要任务按照一定规则把定义的pod分配到集群的node上。scheduler作为单独的程序运行，启动之后会一直监听API server，获取podSpec.NodeName为空的pod，对每个pod都会创建一个binding，表明该pod应该放到哪个node上。调度会考虑到很多问题：
　　效率：调度的性能要好，能够尽快地对大批量的pod完成调度工作；
　　公平：如何保证每个节点都能被分配资源；
　　资源高效利用：集群所有资源最大化被使用；
　　灵活：允许用户根据自己的需求控制调度的逻辑。
调度过程：
　　1）过滤掉不满足条件的node，这个过程称为predicate；如果在predicate过程中没有合适的节点，pod会一直在pending状态，不断重试调度，直到有节点满足条件。经过这个步骤，如果有多个节点满足条件，就继续priority过程。
　　2）对通过的node按照优先级排序，该过程称为priority；
　　3）最后从中选择优先级最高的node。
　　该过程任何一步骤错误，就直接返回错误。
predicate有一系列的算法可以使用：
　　podFitsResources：节点上剩余的资源是否大于pod请求的资源；如果不满足，直接pass掉该node。
　　podFitsHost：如果pod指定了nodeName，检查节点名称是否和NodeName匹配；如果不满足，直接pass掉该node。
　　podFitsHostPorts：节点上已经使用的port是否和pod申请的port冲突；如果不满足，直接pass掉该node。
　　podSelectorMatches：过滤掉和pod指定的label节点是否匹配；如果不匹配，直接pass掉该node。
　　NoDiskConflict：已经mount的volume和pod指定的volume是否冲突；如果冲突，直接pass掉该node，除非它们都是只读。
priority：优先级由一系列键值对组成，键是该优先级项的名称，值是它的权重（该项的重要性）。这些优先级选型包括：
　　LeastRequestedPriority：通过计算CPU和memory的使用率来决定权重，使用率越低权重越高。换句话说，这个优先级指标倾向于资源使用比例更低的节点。
　　BalanceResourceAllocation：节点上CPU和memory使用率越接近，权重越高；该键与上面的键一起使用，不应单独使用。
　　ImageLocalityPriority：倾向于已经有要使用镜像的节点，镜像总大小值越大，权重越高。
　　通过算法对所有的优先级项目和权重进行计算，得出最终结果。
自定义调度器：
　　除了K8S自带的调度器default-scheduler，你可以编写自己的调度器。通过spec:schedulername参数指定调度器的名字，可以为pod选择某个调度器进行调度。
node亲和性
　　pod.spec.affinity.nodeAffinity
　　preferredDuringSchedulingIgnoredDuringExecution:软策略，pod优先调度到该node;若不满足则可进行其他调度；pod有多个软策略时，则会优先选择权重较高的node；
　　requiredDuringSchedulingIgnoredDuringExecution:硬策略，pod必须调度到该node；若不满足则不调度，会报错；

kubectl get node --show-labels #查询node的label；

kubernetes.io/hostname为键名，k8s-node02为键值。

键值运算关系：
　　In:label的值在某个列表中；
　　NotIn：label的值步在某个列表中；
　　Gt：label的值大于某个值；
　　Lt：label的值小于某个值；
　　Exists:某个label存在；
　　DosesNotExist：某个label不存在；

pod亲和性
　　pod.spec.affinity.podAffinity/podAntiAffinity
　　preferredDuringSchedulingIgnoredDuringExecution:软策略，；
　　requiredDuringSchedulingIgnoredDuringExecution:硬策略；
亲和性/反亲和性调度策略比较如下：

污点taint
　　节点亲和性，是pod的一种属性（偏好或硬性要求），它使pod被吸引到一类特定的节点。taint则相反，它使得节点排斥一类特定的pod。
　　taint和toleration相互配合，可用来避免pod被分配到不合适的节点上。每个节点上都可以应用一个或多个taint，这表示对于那些不能容忍这些taint的pod，是不会被该节点接受的。如果将toleration应用于pod上，则表示这些pot可以（但不要求）被调度到具有匹配taint的节点上。
　　可使用kubectl taint命令给某个node设置污点，此后node和某类pod存在排斥关系，可让node拒绝某pod的调度执行，甚至可以将node中已存在的pod驱逐出去。
　　每个污点的组成如下：
　　　　key=value:effect
　　每个污点有一个key（自行定义）和value（自行定义）作为污点的标签，其中value可为空，effect描述污点的作用。当前effect支持如下3个选项：
　　　　NoSchedule：表示k8s将不会将pod调度到具有该污点的node上。
　　　　PreferNoSchedule：表示k8s将尽量避免将pod调度到具有该污点的node上。
　　　　NoExecute：表示将k8s将不会将pod调度到具有该污点的node上，同时会将node上已存在的pod驱逐去除。

案例1：node亲和性---硬策略

cat >node_hard_policy.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: nod-affinity-required
    labels:
        app: nod-affinity-required
spec:
    containers:
    - name: nod-affinity-required
      image: hub.atguigu.com/library/nginx:latest
    affinity:
        nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: NotIn
                    values:
                    - k8s-node02
eof
kubectl apply -f node_hard_policy.yaml
kubectl get pod -o wide
kubectl delete -f node_hard_policy.yaml
kubectl get pod -o wide
kubectl apply -f node_hard_policy.yaml
kubectl get pod -o wide
kubectl delete -f node_hard_policy.yaml
kubectl apply -f node_hard_policy.yaml
kubectl get pod -o wide

结果显示：硬策略要求必须每次不能调度到node2节点。当修改为IN时，会立即调度到node2；当修改为调度到不存在的node时，则会任务会pending。

案例2：node亲和性---软策略

cat >node_soft_policy.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: nod-affinity-preferred
    labels:
        app: nod-affinity-preferred
spec:
    containers:
    - name: nod-affinity-preferred
      image: hub.atguigu.com/library/nginx:latest
    affinity:
        nodeAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              preference:
                  matchExpressions:
                    - key: kubernetes.io/hostname
                      operator: In
                      values:
                      - k8s-node02
eof

kubectl apply -f node_soft_policy.yaml
kubectl get pod -o wide
kubectl delete -f node_soft_policy.yaml
kubectl apply -f node_soft_policy.yaml
kubectl get pod -o wide
kubectl delete -f node_soft_policy.yaml
kubectl apply -f node_soft_policy.yaml
kubectl get pod -o wide

结果显示：软策略要求优先调度到node2节点。当修改为调度到不存在的node时，则会随机调度。

案例3：pod亲和性

cat >pod-1.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: node01
    labels:
        app: node01
spec:
    containers:
    - name: node01
      image: hub.atguigu.com/library/nginx:latest
eof
kubectl apply -f pod-1.yaml
kubectl get pod -o wide --show-labels

cat >pod-2.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: node02
    labels:
        app: node02
spec:
    containers:
    - name: node02
      image: hub.atguigu.com/library/nginx:latest
eof
kubectl apply -f pod-2.yaml
kubectl get pod -o wide --show-labels

cat >pod-affinity.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: pod-3
    labels:
        app: pod-3
spec:
    containers:
    - name: pod-3
      image: hub.atguigu.com/library/nginx:latest
    affinity:
        podAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchExpressions:
                - key: app
                  operator: In
                  values:
                  - node01
              topologyKey: kubernetes.io/hostname
        podAntiAffinity:
            preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 1
              podAffinityTerm:
                labelSelector:
                    matchExpressions:
                    - key: app
                      operator: In
                      values:
                      - node02
                topologyKey: kubernetes.io/hostname              
eof
kubectl apply -f pod-affinity.yaml
kubectl get pod -o wide
kubectl delete -f pod-affinity.yaml
kubectl apply -f pod-affinity.yaml
kubectl get pod -o wide
kubectl delete -f pod-affinity.yaml
kubectl apply -f pod-affinity.yaml
kubectl get pod -o wide
kubectl delete -f pod-affinity.yaml
kubectl apply -f pod-affinity.yaml
kubectl get pod -o wide

结果显示：所有pod都调度到亲和性高的pod；软硬策略与node亲和性类似。

案例4：污点

cat >pod-1.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: node01
    labels:
        app: node01
spec:
    containers:
    - name: node01
      image: hub.atguigu.com/library/nginx:latest
    affinity:
        nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values:
                    - k8s-node01
eof
kubectl apply -f pod-1.yaml

cat >pod-2.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: node02
    labels:
        app: node02
spec:
    containers:
    - name: node02
      image: hub.atguigu.com/library/nginx:latest
    affinity:
        nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                  - key: kubernetes.io/hostname
                    operator: In
                    values:
                    - k8s-node02
eof
kubectl apply -f pod-2.yaml
kubectl get pod -o wide --show-labels

kubectl describe node k8s-master01 |grep Taints #查看k8s-master01节点存在污点：NoSchedule；所以，该节点不会被调度任务。
kubectl taint nodes k8s-node01 kill=node01:NoExecute
kubectl get pod -o wide --show-labels

结果表明：设置污点后，可将pod从node驱逐。

案例5：toleration

kubectl describe node k8s-node02 |grep Taints #查看taint设置，以便填写
cat >pod-1-1.yaml<<eof
apiVersion: v1
kind: Pod
metadata:
    name: node01
    labels:
        app: node01
spec:
    containers:
    - name: node01
      image: hub.atguigu.com/library/nginx:latest
    tolerations:
    - key: "kill"
      operator: "Equal"
      value: "node01"
      effect: "NoExecute"
      tolerationSeconds: 3600
eof
kubectl apply -f pod-1-1.yaml
kubectl get pod -o wide --show-labels

结果表明：设置了toleration的污点允许调度。

案例6：通过nodeName指定node调度

cat >assign_node_schedule.yaml<<eof
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
    name: nginx-deployment
spec:
    replicas: 7
    template:
        metadata:
            labels:
                app: nginx
        spec:
            nodeName: k8s-node02
            containers:
            - name: nginx
              image: hub.atguigu.com/library/nginx:latest
              ports:
              - containerPort: 80
eof
kubectl apply -f assign_node_schedule.yaml
kubectl get pod -o wide --show-labels

posted on 2021-06-29 15:07 chalon 阅读(539) 评论(0) 收藏举报

刷新页面返回顶部

chalon