pod探针和重启策略

探针类型：
livenessProbe    #存活探针(亲和性探针)，检测容器容器是否正在运行，如果存活探测失败，则kubelet会杀死容器，并且容器将受到其重启策略的
影响，如果容器不提供存活探针，则默认状态为 Success，livenessProbe用于控制是否重启pod。

readinessProbe    #就绪探针，如果就绪探测失败，端点控制器将从与Pod匹配的所有Service的端点中删除该Pod的IP地址，初始延迟之
前的就绪状态默认为Failure，如果容器不提供就绪探针，则默认状态为 Success，readinessProbe用于控制pod
是否添加至service。(pod的状态可能还是好的,不能访问)。

两种探针有很多配置字段，可以使用这些字段精确的控制存活和就绪检测的行为：
initialDelaySeconds: 120    #初始化延迟时间，容器运行后，kubelet在执行第一次探测前应该等待多少秒，默认是0秒，最小值是0
periodSeconds: 60            #探测周期间隔时间，指定了kubelet应该每多少秒秒执行一次存活探测，默认是 10 秒。最小值是 1
timeoutSeconds: 5            #单次探测超时时间，探测的超时后等待多少秒，默认值是1秒，最小值是1。
successThreshold: 1            #从失败转为成功的重试次数，探测器在失败后，被视为成功的最小连续成功数，默认值是1，存活探测的这个值必须是1，最小值是 1。
failureThreshold： 3        #从成功转为失败的重试次数，当Pod启动了并且探测到失败，Kubernetes的重试次数，存活探测情况下的放弃就意味着重新启动容器，就绪探测情况下的放弃Pod 会被打上未就绪的标签，默认值是3，最小值是1。



探针是由 kubelet 对容器执行的定期诊断，以保证Pod的状态始终处于运行状态，要执行诊断，kubelet 调用由容器实现的Handler，
有三种类型的处理程序：
    ExecAction:            在容器内执行指定命令，如果命令退出时返回码为0则认为诊断成功。
    TCPSocketAction:    对指定端口上的容器的IP地址进行TCP检查，如果端口打开，则诊断被认为是成功的。
    HTTPGetAction:        对指定的端口和路径上的容器的IP地址执行HTTPGet请求，如果响应的状态码大于等于200且小于 400，则诊断被认为是成功的。
        其中HTTP 探测器可以在 httpGet 上配置额外的字段：
        host:            #连接使用的主机名，默认是Pod的 IP，也可以在HTTP头中设置 “Host” 来代替。
        scheme: http    #用于设置连接主机的方式（HTTP 还是 HTTPS），默认是 HTTP。
        httpHeaders:    #请求中自定义的 HTTP 头,HTTP 头字段允许重复。
        port: 80        #访问容器的端口号或者端口名，如果数字必须在 1 ～ 65535 之间。
        path: /monitor/index.html    #访问 HTTP 服务的路径。

每次探测都将获得以下三种结果之一：
    成功：容器通过了诊断。
    失败：容器未通过诊断。
    未知：诊断失败，因此不会采取任何行动。



livenessProbe和readinessProbe的对比：
livenessProbe连续探测失败会重启、重建pod，readinessProbe不会执行重启或者重建Pod操作
livenessProbe连续检测指定次数失败后会将容器置于(Crash Loop BackOff)切不可用，readinessProbe不会
readinessProbe连续探测失败会从service的endpointd中删除该Pod，livenessProbe不具备此功能，但是会将容器挂起livenessProbe
livenessProbe用户控制是否重启pod(pod的可用性)，readinessProbe用于控制pod是否添加至service(pod中服务的可用性)

配置建议：
两个探针混合配置使用

示例：livenessProbe和HTTPGetAction:
[root@localhost7C ~]#cat nginx-http.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-linux39
  template:
    metadata:
      labels:
        app: nginx-linux39
    spec:
      containers:
      - name: nginx
        image: harbor.linux39.com/baseimages/nginx:1.14.2
        ports:
        - containerPort: 80
        livenessProbe:
          httpGet:
            #path: /monitor/monitor.html
            path: /index.html
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        #readinessProbe:                    #探针类型
        #  httpGet:                            #探针方式
        #    #path: /monitor/monitor.html   #测试地址
        #    path: /index.html
        #    port: 80
        #  initialDelaySeconds: 5           #参数
        #  periodSeconds: 3
        # timeoutSeconds: 5
        # successThreshold: 1
        # failureThreshold: 3  
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: magedu-nginx-service-label
  name: magedu-nginx-service
  namespace: default
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
    nodePort: 30004
  selector:
    app: nginx-linux39


#说明重启过多次。目前状态为CrashLoopBackOff
[root@localhost7C ~]# kubectl  get pod -o wide
NAME                              READY   STATUS             RESTARTS   AGE     IP           NODE                      NOMINATED NODE   READINESS GATES
nginx-deployment-67c7cf88-m5nqt   0/1     CrashLoopBackOff   4          2m24s   10.10.3.54   localhost7f.localdomain   <none>           <none>

示例：readinessProbe和HTTPGetAction:
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-linux39
  template:
    metadata:
      labels:
        app: nginx-linux39
    spec:
      containers:
      - name: nginx
        image: harbor.linux39.com/baseimages/nginx:1.14.2
        ports:
        - containerPort: 80
        readinessProbe:
          httpGet:
            #path: /monitor/monitor.html  #测试
            path: /index.html
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        #livenessProbe:
        #  httpGet:
        #    #path: /monitor/monitor.html
        #    path: /index.html
        #    port: 80
        #  initialDelaySeconds: 5
        #  periodSeconds: 3
        #  timeoutSeconds: 5
        #  successThreshold: 1
        #  failureThreshold: 3
          
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: magedu-nginx-service-label
  name: magedu-nginx-service
  namespace: default
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
    nodePort: 30004
  selector:
    app: nginx-linux39



#容器正常
[root@localhost7C ~]# kubectl  get pod -o wide
NAME                                READY   STATUS    RESTARTS   AGE     IP           NODE                      NOMINATED NODE   READINESS GATES
nginx-deployment-6b9856b557-cnsvd   0/1     Running   0          3m12s   10.10.3.55   localhost7f.localdomain   <none>           <none>

#Endpoints:删除
[root@localhost7C ~]# kubectl  describe  services -n default magedu-nginx-service
Name:                     magedu-nginx-service
Namespace:                default
Labels:                   app=magedu-nginx-service-label
Annotations:              kubectl.kubernetes.io/last-applied-configuration:
                            {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"app":"magedu-nginx-service-label"},"name":"magedu-nginx-servic...
Selector:                 app=nginx-linux39
Type:                     NodePort
IP:                       10.20.101.71
Port:                     http  80/TCP
TargetPort:               80/TCP
NodePort:                 http  30004/TCP
Endpoints:                
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

#显示错误 
[root@localhost7C ~]# kubectl  describe  pod nginx-deployment-6b9856b557-cnsvd 
   ...
   ...
Warning  Unhealthy  57s (x48 over 3m18s)  kubelet, localhost7f.localdomain  Readiness probe failed: HTTP probe failed with statuscode: 404

#TCPSocketAction处理方式，结果一样。
[root@localhost7C case5]# cat nginx-tcp.yml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx-linux39
  template:
    metadata:
      labels:
        app: nginx-linux39
    spec:
      containers:
      - name: nginx
        image: harbor.linux39.com/baseimages/nginx:1.14.2
        ports:
        - containerPort: 80
        livenessProbe:
          tcpSocket:   #tcp方式
            port: 80  
            #port: 8080
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        readinessProbe:
          tcpSocket:    #tcp方式
            port: 80
            #port: 8080
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: magedu-nginx-service-label
  name: magedu-nginx-service
  namespace: default
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 80
    nodePort: 30004
  selector:
    app: nginx-linux39

#ExecAction:处理方式，结果一样。
[root@localhost7C case5]# cat redis-ExecAction.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-deployment
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-linux39
  template:
    metadata:
      labels:
        app: redis-linux39
    spec:
      containers:
      - name: redis
        image: redis:4.0.14 
        ports:
        - containerPort: 6379
        livenessProbe:
          exec:                #ExecAction:处理方式
            command:
            #- /apps/redis/bin/redis-cli
            - /usr/local/bin/redis-cli
            - quit
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        readinessProbe:
          exec:                #ExecAction:处理方式
            command:
            #- /apps/redis/bin/redis-cli   #错误路径
            - /usr/local/bin/redis-cli 
            - quit
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: magedu-redis-service-label
  name: magedu-redis-service
  namespace: default
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 6379
    nodePort: 30005
  selector:
    app: redis-linux39

k8s在Pod出现异常的时候会自动将Pod重启以恢复Pod中的服务，Pod重启策略。
restartPolicy：
    Always：    当容器异常时，k8s自动重启该容器，ReplicationController/Replicaset/Deployment。
    OnFailure： 当容器失败时(容器停止运行且退出码不为0)，k8s自动重启该容器。
    Never：        不论容器运行状态如何都不会重启该容器,Job或CronJob。

[root@localhost7C case5]# cat redis-ExecAction.yml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis-deployment
  labels:
    app: redis
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis-linux39
  template:
    metadata:
      labels:
        app: redis-linux39
    spec:
      containers:
      - name: redis
        image: redis:4.0.14 
        ports:
        - containerPort: 6379
        livenessProbe:
          exec:                #ExecAction:处理方式
            command:
            - /apps/redis/bin/redis-cli        #错误路径
            #- /usr/local/bin/redis-cli
            - quit
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
        readinessProbe:
          exec:                #ExecAction:处理方式
            command:
            - /apps/redis/bin/redis-cli      #错误路径 
            #- /usr/local/bin/redis-cli 
            - quit
          initialDelaySeconds: 5
          periodSeconds: 3
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
      restartPolicy: Always

---
kind: Service
apiVersion: v1
metadata:
  labels:
    app: magedu-redis-service-label
  name: magedu-redis-service
  namespace: default
spec:
  type: NodePort
  ports:
  - name: http
    port: 80
    protocol: TCP
    targetPort: 6379
    nodePort: 30005
  selector:
    app: redis-linux39

#当配合两种探针时，一直循环重启
[root@localhost7C ~]# kubectl  get pod -o wide
NAME                                READY   STATUS             RESTARTS   AGE   IP           NODE                      NOMINATED NODE   READINESS GATES
redis-deployment-6c594c6dd6-8dpst   0/1     CrashLoopBackOff   12         21m   10.10.3.63   localhost7f.localdomain   <none>           <none>

我们在日常的集群维护中经常使用 kubectl get pod 查看pod的状态。
状态值    含义
CrashLoopBackOff            容器退出，kubelet正在将它重启
Terminating                    结束
completed                    job资源正常启动
InvalidImageName            无法解析镜像名称
ImageInspectError            无法校验镜像
ErrImageNeverPull            策略禁止拉取镜像
ImagePullBackOff            正在重试拉取
RegistryUnavailable            连接不到镜像仓库（如：harbor）
ErrImagePull                拉取镜像出错
CreateContainerConfigError    不能创建kubelet使用的容器配置
CreateContainerError        创建容器失败
m.internalLifecycle.PreStartContainer    执行hook报错
RunContainerError            启动容器失败
PostStartHookError            执行hook报错
ContainersNotInitialized    容器没有初始化完毕
ContainersNotReady            容器没有准备完毕
ContainerCreating            容器创建中
PodInitializing    pod         初始化中
DockerDaemonNotReady        docker还没有完全启动
NetworkPluginNotReady        网络插件还没有完全启动
Evicted                        即驱赶(当节点出现异常时,kubernetes将有相应的
Pending:        #正在创建Pod但是Pod中的容器还没有全部被创建完成，处于此状态的Pod应该检查Pod依赖的存储是否有权限挂载、镜像是否可以下载、调度是否正常等。
Failed            #Pod中有容器启动失败而导致pod工作异常。
Unknown            #由于某种原因无法获得pod的当前状态，通常是由于与pod所在的node节点通信错误。
Succeeded        #Pod中的所有容器都被成功终止即pod里所有的containers均已terminated。
Unschedulable： #Pod不能被调度，kube-scheduler没有匹配到合适的node节点
PodScheduled    #pod正处于调度中，在kube-scheduler刚开始调度的时候，还没有将pod分配到指定的pid，在筛选出合适的节点后就会更新etcd数据，将pod分配到指定的pod
Initialized        #所有pod中的初始化容器已经完成了
ImagePullBackOff：#Pod所在的node节点下载镜像失败
Running            #Pod内部的容器已经被创建并且启动。
Ready            #表示pod中的容器已经可以提供访问服务

posted @ 2023-03-08 11:55 yuanbangchen 阅读(340) 评论(0) 收藏举报

刷新页面返回顶部

袁邦臣

pod探针和重启策略

公告