31-容器的探针(Probe)
一、容器探针(Probe)探测类型,检查机制及探测结果
1.探针(Probe)的探测类型
livenessProbe:
健康状态检查,周期性检查服务是否存活,检查结果失败,将"重启"容器(删除源容器并重新创建新容器)。
如果容器没有提供健康状态检查,则默认状态为Success。
readinessProbe:
可用性检查,周期性检查服务是否可用,从而判断容器是否就绪。
若检测Pod服务不可用,会将Pod标记为未就绪状态,而svc的ep列表会将Addresses的地址移动到NotReadyAddresses列表。
若检测Pod服务可用,则ep会将Pod地址从NotReadyAddresses列表重新添加到Addresses列表中。
如果容器没有提供可用性检查,则默认状态为Success。
startupProbe: (1.16+之后的版本才支持)
如果提供了启动探针,则所有其他探针都会被禁用,直到此探针成功为止。
如果启动探测失败,kubelet将杀死容器,而容器依其重启策略进行重启。
如果容器没有提供启动探测,则默认状态为 Success。
对于starup探针是一次性检测,容器启动时进行检测,检测成功后,才会调用其他探针,且此探针不在生效。
2.探针(Probe)检查机制
exec:
执行一段命令,根据返回值判断执行结果。返回值为0或非0,有点类似于"echo $?"。
httpGet:
发起HTTP请求,根据返回的状态码来判断服务是否正常。
200: 返回状态码成功
301: 永久跳转
302: 临时跳转
401: 验证失败
403: 权限被拒绝
404: 文件找不到
413: 文件上传过大
500: 服务器内部错误
502: 无效的请求
504: 后端应用网关响应超时
...
tcpSocket:
测试某个TCP端口是否能够链接,类似于telnet,nc等测试工具。
grpc:
k8s 1.19+版本才支持,1.23依旧属于一个alpha阶段
3.探测结果
每次探测都将获得以下三种结果之一:
Success(成功)
容器通过了诊断。
Failure(失败)
容器未通过诊断。
Unknown(未知)
诊断失败,因此不会采取任何行动。
参考链接:
https://kubernetes.io/zh/docs/concepts/workloads/pods/pod-lifecycle/#types-of-probe
https://kubernetes.io/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#probe-check-methods
https://kubernetes.io/zh-cn/docs/concepts/workloads/pods/pod-lifecycle/#probe-outcome
二、livenessProbe探针
1.exec探测方式
[root@master231 probe]# cat 01-deploy-livenessProbe-exec.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-exec
spec:
replicas: 5
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
restartPolicy: Always
containers:
- image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
name: c1
command:
- /bin/sh
- -c
- touch /tmp/dingzhiyan-linux-healthy; sleep 20; rm -f /tmp/dingzhiyan-linux-healthy; sleep 600
# 健康状态检查,周期性检查服务是否存活,检查结果失败,将重启容器。
livenessProbe:
# 使用exec的方式去做健康检查
exec:
# 自定义检查的命令
command:
- cat
- /tmp/dingzhiyan-linux-healthy
# 指定探针检测的频率,默认是10s,最小值为1.
periodSeconds: 1
# 检测服务失败次数的累加值,默认值是3次,最小值是1。当检测服务成功后,该值会被重置!
failureThreshold: 3
# 检测服务成功次数的累加值,默认值为1次,最小值1.
successThreshold: 1
# 指定多久之后进行健康状态检查,即此时间段内检测服务失败并不会对failureThreshold进行计数。
initialDelaySeconds: 30
# 一次检测周期超时的秒数,默认值是1秒,最小值为1.
timeoutSeconds: 1
[root@master231 probe]#
2.httpGet探测方式
[root@master231 probe]# cat 02-deploy-livenessProbe-httpGet.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-httpget
spec:
replicas: 5
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
restartPolicy: Always
containers:
- image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
name: c1
# 健康状态检查,周期性检查服务是否存活,检查结果失败,将重启容器。
livenessProbe:
# 使用httpGet的方式去做健康检查
httpGet:
# 指定访问的端口号
port: 80
# 检测指定的访问路径
path: /index.html
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
[root@master231 probe]#
3.tcpSocket探测方式
[root@master231 probe]# cat 03-deploy-livenessProbe-tcpSocket.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-tcpsocket
spec:
replicas: 5
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
restartPolicy: Always
containers:
- image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
name: c1
# 健康状态检查,周期性检查服务是否存活,检查结果失败,将重启容器。
livenessProbe:
# 使用tcpSocket的方式去做健康检查
tcpSocket:
port: 80
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 1
[root@master231 probe]#
测试方式:
可以在集群内部访问某个Pod的IP地址,而后进入该pod修改nginx的端口配置并热加载,15s内会自动重启。
4.grpc探测方式
[root@master231 probe]# cat 04-deploy-livenessProbe-grpc.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-grpc
spec:
replicas: 5
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
restartPolicy: Always
containers:
- image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/etcd:3.5.10
name: web
imagePullPolicy: IfNotPresent
command:
- /opt/bitnami/etcd/bin/etcd
- --data-dir=/tmp/etcd
- --listen-client-urls=http://0.0.0.0:2379
- --advertise-client-urls=http://127.0.0.1:2379
- --log-level=debug
ports:
- containerPort: 2379
livenessProbe:
# 对grpc端口发起grpc调用,目前属于alpha测试阶段,如果真的想要使用,请在更高版本关注,比如k8s 1.24+
# 在1.23.17版本中,如果检测失败,会触发警告,但不会重启容器只是会有警告事件。
grpc:
port: 2379
# 指定服务,但是服务名称我是瞎写的,实际工作中会有开发告诉你
service: /health
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
[root@master231 probe]#
三、readinessProbe探针
1.exec探测方式
[root@master231 probe]# cat 05-deploy-readinessprobe-livenessProbe-exec.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-readinessprobe-exec
spec:
revisionHistoryLimit: 1
strategy:
type: "RollingUpdate"
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
replicas: 3
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
restartPolicy: Always
containers:
- name: c1
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
imagePullPolicy: Always
ports:
- containerPort: 80
command:
- /bin/sh
- -c
- nginx; touch /tmp/dingzhiyan-linux-healthy; sleep 30; rm -f /tmp/dingzhiyan-linux-healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/dingzhiyan-linux-healthy
failureThreshold: 3
initialDelaySeconds: 65
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
# 可用性检查,周期性检查服务是否可用,从而判断容器是否就绪.
readinessProbe:
# 使用exec的方式去做健康检查
exec:
# 自定义检查的命令
command:
- cat
- /tmp/dingzhiyan-linux-healthy
failureThreshold: 3
initialDelaySeconds: 15
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: svc-xiuxain
spec:
clusterIP: 10.200.20.25
selector:
apps: xiuxian
ports:
- port: 80
[root@master231 probe]#
测试方式:
[root@master231 sts]# while true; do curl 10.200.20.25 ; sleep 0.1;done
2.httpGet探测方式
[root@master231 probe]# cat 06-deploy-readinessProbe-livenessProbe-httpGet.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-readinessprobe-httpget
spec:
revisionHistoryLimit: 1
strategy:
type: "RollingUpdate"
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
replicas: 3
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
restartPolicy: Always
containers:
- name: c1
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
imagePullPolicy: Always
ports:
- containerPort: 80
command:
- /bin/sh
- -c
- touch /tmp/dingzhiyan-linux-healthy; sleep 30; rm -f /tmp/dingzhiyan-linux-healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/dingzhiyan-linux-healthy
failureThreshold: 3
initialDelaySeconds: 180
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
# 可用性检查,周期性检查服务是否可用,从而判断容器是否就绪.
readinessProbe:
# 使用httpGet的方式去做健康检查
httpGet:
# 指定访问的端口号
port: 80
path: /index.html
failureThreshold: 3
initialDelaySeconds: 15
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: svc-xiuxain
spec:
clusterIP: 10.200.20.25
selector:
apps: xiuxian
ports:
- port: 80
[root@master231 probe]#
测试方式:
[root@master231 ~]# while true;do curl 10.200.20.25;sleep 0.5;done
3.tcpSocket探测方式
[root@master231 probe]# cat 07-deploy-readinessProbe-livenessProbe-tcpSocket.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-readinessprobe-tcpsocket
spec:
revisionHistoryLimit: 1
strategy:
type: "RollingUpdate"
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
replicas: 3
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
restartPolicy: Always
containers:
- name: c1
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
imagePullPolicy: Always
ports:
- containerPort: 80
command:
- /bin/sh
- -c
- touch /tmp/dingzhiyan-linux-healthy; sleep 30; rm -f /tmp/dingzhiyan-linux-healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/dingzhiyan-linux-healthy
failureThreshold: 3
initialDelaySeconds: 300
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
# 可用性检查,周期性检查服务是否可用,从而判断容器是否就绪.
readinessProbe:
# 使用tcpSocket的方式去做健康检查
tcpSocket:
# 探测80端口是否存活
port: 80
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: svc-xiuxain
spec:
clusterIP: 10.200.20.25
selector:
apps: xiuxian
ports:
- port: 80
[root@master231 probe]#
测试方式:
[root@master231 ~]# while true;do curl 10.200.20.25;sleep 0.5;done
四、 startupProbe启动探针实战
[root@master231 probe]# cat 08-deploy-startupProbe-httpGet.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-livenessprobe-readinessprobe-startupprobe-httpget
spec:
revisionHistoryLimit: 1
strategy:
type: "RollingUpdate"
rollingUpdate:
maxUnavailable: 1
maxSurge: 2
replicas: 3
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
volumes:
- name: data
emptyDir: {}
# 初始化容器仅在Pod创建时执行一次,容器重启时并不会调用初始化容器。
initContainers:
- name: init01
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
volumeMounts:
- name: data
mountPath: /dingzhiyan
command:
- /bin/sh
- -c
- echo "liveness probe test page" >> /dingzhiyan/huozhe.html
- name: init02
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v2
volumeMounts:
- name: data
mountPath: /dingzhiyan
command:
- /bin/sh
- -c
- echo "readiness probe test page" >> /dingzhiyan/dingzhiyan.html
- name: init03
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v3
volumeMounts:
- name: data
mountPath: /dingzhiyan
command:
- /bin/sh
- -c
- echo "startup probe test page" >> /dingzhiyan/start.html
containers:
- name: c1
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
volumeMounts:
- name: data
mountPath: /usr/share/nginx/html
# 周期性:判断服务是否健康,若检查不通过,将Pod直接重启。
livenessProbe:
httpGet:
port: 80
path: /huozhe.html
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 1
successThreshold: 1
timeoutSeconds: 1
# 周期性: 判断服务是否就绪,若检查不通过,将Pod标记为未就绪状态。
readinessProbe:
httpGet:
port: 80
path: /dingzhiyan.html
failureThreshold: 3
initialDelaySeconds: 10
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
# 一次性: 容器启动时做检查,若检查不通过,直接杀死容器。并进行重启!
# startupProbe探针通过后才回去执行readinessProbe和livenessProbe哟~
startupProbe:
httpGet:
port: 80
path: /start.html
failureThreshold: 3
# 尽管上面的readinessProbe和livenessProbe数据已经就绪,但必须等待startupProbe的检测成功后才能执行。
initialDelaySeconds: 35
periodSeconds: 3
successThreshold: 1
timeoutSeconds: 1
---
apiVersion: v1
kind: Service
metadata:
name: svc-xiuxain
spec:
clusterIP: 10.200.20.25
selector:
apps: xiuxian
ports:
- port: 80
[root@master231 probe]#
测试验证:
[root@master231 sts]# while true; do curl 10.200.20.25/huozhe.html ; sleep 0.1;done
liveness probe test page
liveness probe test page
liveness probe test page
liveness probe test page
liveness probe test page
liveness probe test page
liveness probe test page
liveness probe test page
liveness probe test page
^C
[root@master231 sts]# while true; do curl 10.200.20.25/dingzhiyan.html ; sleep 0.1;done
readiness probe test page
readiness probe test page
readiness probe test page
readiness probe test page
readiness probe test page
readiness probe test page
readiness probe test page
readiness probe test page
readiness probe test page
readiness probe test page
^C
[root@master231 sts]# while true; do curl 10.200.20.25/start.html ; sleep 0.1;done
startup probe test page
startup probe test page
startup probe test page
startup probe test page
startup probe test page
startup probe test page
startup probe test page
^C
[root@master231 sts]#
彩蛋:查看容器重启之前的上一个日志信息。
[root@master231 probe]# kubectl logs -f deploy-livenessprobe-readinessprobe-startupprobe-httpget-96k7bw
...
10.0.0.233 - - [20/Apr/2025:06:54:34 +0000] "GET /start.html HTTP/1.1" 200 24 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:31 +0000] "GET /dingzhiyan.html HTTP/1.1" 200 26 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:31 +0000] "GET /huozhe.html HTTP/1.1" 200 25 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:32 +0000] "GET /huozhe.html HTTP/1.1" 200 25 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:33 +0000] "GET /huozhe.html HTTP/1.1" 200 25 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:34 +0000] "GET /huozhe.html HTTP/1.1" 200 25 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:34 +0000] "GET /dingzhiyan.html HTTP/1.1" 200 26 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:35 +0000] "GET /huozhe.html HTTP/1.1" 200 25 "-" "kube-probe/1.23" "-"
10.0.0.233 - - [20/Apr/2025:06:55:36 +0000] "GET /huozhe.html HTTP/1.1" 200 25 "-" "kube-probe/1.23" "-"
...
本文来自博客园,作者:丁志岩,转载请注明原文链接:https://www.cnblogs.com/dezyan/p/18888848

浙公网安备 33010602011771号