K8S Vertical Pod Autoscaler(VPA)实战案例
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
目录
一.K8S VPA概述
1.Kubernetes Autoscaler概述
Kubernetes Autoscaler项目主要包含三个部分,分别对应为: Cluster Autoscaler,Vertical Pod Autoscaler 和Addon Resizer。
Cluster Autoscaler(简称:"CA")
是一个自动调整Kubernetes集群大小的组件,以确保所有Pod都有运行空间,且不存在多余的节点。
该组件支持多家公有云提供商。其1.0版本(正式版)已随Kubernetes 1.8发布。
Vertical Pod Autoscaler(简称:"VPA")
一套可自动调整Kubernetes集群中运行的Pod所请求的CPU和内存量的组件。当前状态——测试版。
Addon Resizer:
VPA的简化版本,可根据Kubernetes集群中的节点数量调整部署的资源请求。当前状态 - 测试版。
github地址:
https://github.com/kubernetes/autoscaler
本次课程我们主要探讨的是VPA项目实战。
2.VPA架构图解
简单陈述VPA的工作机制如下:
- 1.检查metrics指标;
- 2.判断是否达到预定义的阈值;
- 3.更改资源限制(cpu/memory);
- 4.重新部署新的Pod并扩展新的资源使用;
使用VPA时注意事项:
- 1.如果不重新启动Pod将无法更改资源,因此重新启动Pod并根据新分配的资源对其进行调度;
- 2.VPA和HPA尚不兼容,不能在同一个Pod上使用,如果需要再同一个集群同时使用他们,请确保在设置中分开他们的作用域;
- 3.VPA仅根据观察到的过去和当前资源使用情况来调整容器的资源请求,它没有设置资源限制,对于行为不当的应用程序可能会出现问题,这些应用程序开始使用越来越多的资源,导致Pod被Kubernetes杀死;
二.部署VPA
1.部署metrics-server提供Pod的资源监控
1.1 部署metrics-server组件
温馨提示:
如果已经部署metrics-server则可跳过此步骤,如果没有部署可以参考我之前的笔记。
参考笔记:
https://www.cnblogs.com/yinzhengjie/p/19003670
1.部署metrics-Server
[root@master241 ~]# kubectl get pods,svc -n kube-system -l k8s-app=metrics-server -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/metrics-server-7cd44b454f-rgb26 1/1 Running 0 3m9s 10.100.207.18 worker243 <none> <none>
pod/metrics-server-7cd44b454f-zbn4g 1/1 Running 0 3m9s 10.100.165.155 worker242 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/metrics-server ClusterIP 10.193.112.117 <none> 443/TCP 3m9s k8s-app=metrics-server
[root@master241 ~]#
2.验证metrics-Server组件
[root@master241 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master241 Ready control-plane 130d v1.31.9 10.0.0.241 <none> Ubuntu 22.04.4 LTS 5.15.0-144-generic containerd://1.6.36
worker242 Ready <none> 130d v1.31.9 10.0.0.242 <none> Ubuntu 22.04.4 LTS 5.15.0-144-generic containerd://1.6.36
worker243 Ready <none> 130d v1.31.9 10.0.0.243 <none> Ubuntu 22.04.4 LTS 5.15.0-144-generic containerd://1.6.36
[root@master241 ~]#
[root@master241 ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
master241 115m 5% 1828Mi 48%
worker242 77m 3% 1507Mi 19%
worker243 45m 2% 2428Mi 31%
[root@master241 ~]#
1.2 metrics-server的API测试验证
1.获取metrics的测试API,观察是否有NodeMetrics和PodMetrics类型。
[root@master241 ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1 | python3 -m json.tool
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "metrics.k8s.io/v1beta1",
"resources": [
{
"name": "nodes",
"singularName": "",
"namespaced": false,
"kind": "NodeMetrics",
"verbs": [
"get",
"list"
]
},
{
"name": "pods",
"singularName": "",
"namespaced": true,
"kind": "PodMetrics",
"verbs": [
"get",
"list"
]
}
]
}
[root@master241 ~]#
2.基于metrics的API获取Pod相关的指标
[root@master241 ~]# kubectl get pods,svc -n kube-system -l k8s-app=metrics-server -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/metrics-server-7cd44b454f-rgb26 1/1 Running 0 16m 10.100.207.18 worker243 <none> <none>
pod/metrics-server-7cd44b454f-zbn4g 1/1 Running 0 16m 10.100.165.155 worker242 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/metrics-server ClusterIP 10.193.112.117 <none> 443/TCP 16m k8s-app=metrics-server
[root@master241 ~]#
[root@master241 ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/kube-system/pods/metrics-server-7cd44b454f-zbn4g | python3 -m json.tool
{
"kind": "PodMetrics",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"name": "metrics-server-7cd44b454f-zbn4g",
"namespace": "kube-system",
"creationTimestamp": "2025-09-23T10:55:51Z",
"labels": {
"k8s-app": "metrics-server",
"pod-template-hash": "7cd44b454f"
}
},
"timestamp": "2025-09-23T10:55:28Z",
"window": "10.257s",
"containers": [
{
"name": "metrics-server",
"usage": {
"cpu": "1688992n",
"memory": "17Mi"
}
}
]
}
[root@master241 ~]#
3.基于metrics的API获取Node相关的指标
[root@master241 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master241 Ready control-plane 130d v1.31.9 10.0.0.241 <none> Ubuntu 22.04.4 LTS 5.15.0-144-generic containerd://1.6.36
worker242 Ready <none> 130d v1.31.9 10.0.0.242 <none> Ubuntu 22.04.4 LTS 5.15.0-144-generic containerd://1.6.36
worker243 Ready <none> 130d v1.31.9 10.0.0.243 <none> Ubuntu 22.04.4 LTS 5.15.0-144-generic containerd://1.6.36
[root@master241 ~]#
[root@master241 ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes/worker243 | python3 -m json.tool
{
"kind": "NodeMetrics",
"apiVersion": "metrics.k8s.io/v1beta1",
"metadata": {
"name": "worker243",
"creationTimestamp": "2025-09-23T10:57:07Z",
"labels": {
"beta.kubernetes.io/arch": "amd64",
"beta.kubernetes.io/os": "linux",
"kubernetes.io/arch": "amd64",
"kubernetes.io/hostname": "worker243",
"kubernetes.io/os": "linux"
}
},
"timestamp": "2025-09-23T10:56:53Z",
"window": "20.062s",
"usage": {
"cpu": "54830026n",
"memory": "2518172Ki"
}
}
[root@master241 ~]#
2.部署vertical-pod-autoscaler组件
2.1 克隆代码
如上图所示,列出了VPA和Kubernetes版本的对应关系。
参考链接:
https://github.com/kubernetes/autoscaler/blob/master/vertical-pod-autoscaler/docs/installation.md#compatibility
2.2 使用git进行克隆
[root@master241 ~]# git clone https://github.com/kubernetes/autoscaler.git
Cloning into 'autoscaler'...
remote: Enumerating objects: 212247, done.
remote: Total 212247 (delta 0), reused 0 (delta 0), pack-reused 212247 (from 1)
Receiving objects: 100% (212247/212247), 241.68 MiB | 679.00 KiB/s, done.
Resolving deltas: 100% (136251/136251), done.
Updating files: 100% (20944/20944), done.
[root@master241 ~]#
2.3 安装vpa
1.进入到vpa的代码目录
[root@master241 ~]# cd autoscaler/vertical-pod-autoscaler/
[root@master241 vertical-pod-autoscaler]#
2.安装VPA
[root@master241 vertical-pod-autoscaler]# ./hack/vpa-up.sh
HEAD is now at 6569b7734 Merge pull request #7178 from raywainman/vpa-release-1.2
customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalercheckpoints.autoscaling.k8s.io created
customresourcedefinition.apiextensions.k8s.io/verticalpodautoscalers.autoscaling.k8s.io created
clusterrole.rbac.authorization.k8s.io/system:metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:vpa-actor created
clusterrole.rbac.authorization.k8s.io/system:vpa-status-actor created
clusterrole.rbac.authorization.k8s.io/system:vpa-checkpoint-actor created
clusterrole.rbac.authorization.k8s.io/system:evictioner created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-actor created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-status-actor created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-checkpoint-actor created
clusterrole.rbac.authorization.k8s.io/system:vpa-target-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-target-reader-binding created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-evictioner-binding created
serviceaccount/vpa-admission-controller created
serviceaccount/vpa-recommender created
serviceaccount/vpa-updater created
clusterrole.rbac.authorization.k8s.io/system:vpa-admission-controller created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-admission-controller created
clusterrole.rbac.authorization.k8s.io/system:vpa-status-reader created
clusterrolebinding.rbac.authorization.k8s.io/system:vpa-status-reader-binding created
role.rbac.authorization.k8s.io/system:leader-locking-vpa-updater created
rolebinding.rbac.authorization.k8s.io/system:leader-locking-vpa-updater created
role.rbac.authorization.k8s.io/system:leader-locking-vpa-recommender created
rolebinding.rbac.authorization.k8s.io/system:leader-locking-vpa-recommender created
deployment.apps/vpa-updater created
deployment.apps/vpa-recommender created
deployment.apps/vpa-admission-controller created
service/vpa-webhook created
[root@master241 vertical-pod-autoscaler]#
2.4 查看vpa相关组件是否正常
[root@master241 ~]# kubectl get pods -n kube-system -l 'app in (vpa-admission-controller,vpa-recommender,vpa-updater)' -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
vpa-admission-controller-7466785b46-z92sn 1/1 Running 0 4s 10.100.165.188 worker242 <none> <none>
vpa-recommender-7bc485dbd7-hrzxf 1/1 Running 0 4s 10.100.207.5 worker243 <none> <none>
vpa-updater-7d6445968b-m7lpn 1/1 Running 0 4s 10.100.165.185 worker242 <none> <none>
[root@master241 ~]#
温馨提示:
- 1.官方的镜像拉取策略是Always,因此需要修改Deployment控制器的镜像拉取策略为IfNotPresent,可以尝试手动拉取镜像;
- 2.如果此步骤不正常,比如报错找不到资源,可以尝试先卸载执行'./hack/vpa-down.sh'脚本,再重新执行'./hack/vpa-up.sh'脚本;
2.5 VPA资源清单路径
[root@master241 vertical-pod-autoscaler]# pwd
/root/autoscaler/vertical-pod-autoscaler
[root@master241 vertical-pod-autoscaler]#
[root@master241 vertical-pod-autoscaler]# ll deploy/
total 100
drwxr-xr-x 2 root root 4096 Sep 24 17:48 ./
drwxr-xr-x 10 root root 4096 Sep 24 10:01 ../
-rw-r--r-- 1 root root 1487 Sep 24 10:01 admission-controller-deployment.yaml
-rw-r--r-- 1 root root 144 Sep 23 19:32 kustomization.yaml
-rw-r--r-- 1 root root 980 Sep 23 19:32 recommender-deployment-high.yaml
-rw-r--r-- 1 root root 960 Sep 23 19:32 recommender-deployment-low.yaml
-rw-r--r-- 1 root root 747 Sep 24 10:01 recommender-deployment.yaml
-rw-r--r-- 1 root root 897 Sep 24 10:01 updater-deployment.yaml
-rw-r--r-- 1 root root 1623 Sep 23 19:32 vpa-beta2-crd.yaml
-rw-r--r-- 1 root root 1380 Sep 23 19:32 vpa-beta-crd.yaml
-rw-r--r-- 1 root root 1398 Sep 23 19:32 vpa-crd.yaml
-rw-r--r-- 1 root root 8206 Sep 23 19:32 vpa-rbac.yaml
-rw-r--r-- 1 root root 40381 Sep 23 19:32 vpa-v1-crd-gen.yaml
-rw-r--r-- 1 root root 2705 Sep 23 19:32 vpa-v1-crd.yaml
[root@master241 vertical-pod-autoscaler]#
三.使用VPA
1.VPA的更新策略
VPA有四种更新策略(updateMode): Initial,Auto,Recreate,Off。
- Initial:
仅在Pod创建时修改资源请求,以后都不修改。
- Auto:
默认策略,在Pod创建和更新时都会修改资源请求,并且在Pod更像时也会修改。
- Recreate:
类似Auto,在Pod的创建和更新时都会修改资源请求,不同的是,只要Pod中的请求值与新的推荐值不同,VPA都会驱逐该Pod,然后使用新的值重新启一个。
综上所述,一般不不使用该策略,而是使用Auto,除非你真的需要保证请求值是最新的推荐值。
- Off:
不改变Pod的资源请求,不过仍然会在VPA中设置资源的推荐值。
生产环境中,一般情况下,我们使用默认的Auto更新模式。
2.VPA实战案例
2.1 准备测试环境
1.编写资源清单
[root@master241 case-demo]# cat 01-deploy-svc.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: deploy-vpa-case
spec:
replicas: 3
selector:
matchLabels:
apps: xiuxian
template:
metadata:
labels:
apps: xiuxian
spec:
containers:
- name: c1
image: registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1
resources:
requests:
memory: 100Mi
cpu: 100m
ports:
- containerPort: 80
name: web
---
apiVersion: v1
kind: Service
metadata:
name: svc-xiuxian
spec:
ports:
- port: 80
selector:
apps: xiuxian
type: ClusterIP
[root@master241 case-demo]#
2.创建资源
[root@master241 case-demo]# kubectl apply -f 01-deploy-svc.yaml
deployment.apps/deploy-vpa-case created
service/svc-xiuxian created
[root@master241 case-demo]#
[root@master241 case-demo]# kubectl get deploy,rs,po,svc -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/deploy-vpa-case 3/3 3 3 14s c1 registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1 apps=xiuxian
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/deploy-vpa-case-76d548698d 3 3 3 14s c1 registry.cn-hangzhou.aliyuncs.com/yinzhengjie-k8s/apps:v1 apps=xiuxian,pod-template-hash=76d548698d
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/deploy-vpa-case-76d548698d-64525 1/1 Running 0 14s 10.100.165.163 worker242 <none> <none>
pod/deploy-vpa-case-76d548698d-bqq98 1/1 Running 0 14s 10.100.207.40 worker243 <none> <none>
pod/deploy-vpa-case-76d548698d-nzlqp 1/1 Running 0 14s 10.100.207.39 worker243 <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.192.0.1 <none> 443/TCP 65d <none>
service/svc-xiuxian ClusterIP 10.206.46.20 <none> 80/TCP 14s apps=xiuxian
[root@master241 case-demo]#
3.查看Pod默认的资源限制
[root@master241 case-demo]# kubectl get pods -o yaml | grep resources -A 3
resources:
requests:
cpu: 100m
memory: 100Mi
--
resources:
requests:
cpu: 100m
memory: 100Mi
--
resources:
requests:
cpu: 100m
memory: 100Mi
[root@master241 case-demo]#
2.2 创建vpa
1.编写资源清单
[root@master241 case-demo]# cat 02-vpa.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: vpa-case
spec:
# 指定更新策略
updatePolicy:
# 有效值为: Off, Initial, Recreate, Auto(Default)。
updateMode: "Auto"
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: deploy-vpa-case
resourcePolicy:
containerPolicies:
- containerName: c1
minAllowed:
cpu: 200m
memory: 300Mi
maxAllowed:
cpu: 1500m
memory: 2048Mi
[root@master241 case-demo]#
2.创建vpa
[root@master241 case-demo]# kubectl apply -f 02-vpa.yaml
verticalpodautoscaler.autoscaling.k8s.io/vpa-case created
[root@master241 case-demo]#
[root@master241 case-demo]# kubectl get vpa
NAME MODE CPU MEM PROVIDED AGE
vpa-case Auto 5s
[root@master241 case-demo]#
[root@master241 case-demo]# kubectl apply -f 02-vpa.yaml
verticalpodautoscaler.autoscaling.k8s.io/vpa-case created
[root@master241 case-demo]#
[root@master241 case-demo]# kubectl get vpa
NAME MODE CPU MEM PROVIDED AGE
vpa-case Auto 200m 300Mi True 5s
[root@master241 case-demo]#
[root@master241 case-demo]# kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-vpa-case-76d548698d-2s8r8 1/1 Running 0 88s 10.100.207.23 worker243 <none> <none>
deploy-vpa-case-76d548698d-fxd2q 1/1 Running 0 6s 10.100.207.19 worker243 <none> <none>
deploy-vpa-case-76d548698d-qsxrn 1/1 Running 0 88s 10.100.165.156 worker242 <none> <none>
[root@master241 case-demo]#
3.查看vpa的详细信息
[root@master241 case-demo]# kubectl describe vpa vpa-case
Name: vpa-case
Namespace: default
Labels: <none>
Annotations: <none>
API Version: autoscaling.k8s.io/v1
Kind: VerticalPodAutoscaler
Metadata:
Creation Timestamp: 2025-09-25T02:56:47Z
Generation: 1
Resource Version: 5146218
UID: d531c410-08ff-4c8d-8243-e1f673fadeb0
Spec:
Resource Policy:
Container Policies:
Container Name: c1
Max Allowed:
Cpu: 1500m
Memory: 2048Mi
Min Allowed:
Cpu: 200m
Memory: 300Mi
Target Ref:
API Version: apps/v1
Kind: Deployment
Name: deploy-vpa-case
Update Policy:
Update Mode: Auto
Status:
Conditions:
Last Transition Time: 2025-09-25T02:57:36Z
Status: True
Type: RecommendationProvided
Recommendation:
Container Recommendations:
Container Name: c1
Lower Bound: # 下限值
Cpu: 200m
Memory: 300Mi
Target: # 推荐值
Cpu: 200m
Memory: 300Mi
Uncapped Target: # 如果没有为VPA提供最大或最小边界,则表示目标利用率。
Cpu: 25m
Memory: 262144k
Upper Bound: # 上限值
Cpu: 200m
Memory: 300Mi
Events: <none>
[root@master241 case-demo]#
2.3 测试验证
1.安装ab压测工具
[root@master241 case-demo]# apt -y install apache2-utils
2.压力测试
[root@master241 case-demo]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.192.0.1 <none> 443/TCP 65d
svc-xiuxian ClusterIP 10.206.46.20 <none> 80/TCP 4m34s
[root@master241 case-demo]#
[root@master241 case-demo]# ab -c 100 -n 1000000 http://10.206.46.20/
This is ApacheBench, Version 2.3 <$Revision: 1879490 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/
Benchmarking 10.206.46.20 (be patient)
3.查看Pod状态
[root@master241 case-demo]# kubectl get pods -o wide # 不拿发现,Pod被重新创建了。
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
deploy-vpa-case-76d548698d-4q7dz 1/1 Running 0 23s 10.100.207.36 worker243 <none> <none>
deploy-vpa-case-76d548698d-8fjkx 1/1 Running 0 83s 10.100.165.161 worker242 <none> <none>
deploy-vpa-case-76d548698d-fxd2q 1/1 Running 0 2m23s 10.100.207.19 worker243 <none> <none>
[root@master241 case-demo]#
[root@master241 case-demo]# kubectl get pods -o yaml | grep resources -A 3
vpaUpdates: 'Pod resources updated by vpa-case: container 0: cpu request, memory
request'
creationTimestamp: "2025-09-25T03:04:37Z"
generateName: deploy-vpa-case-76d548698d-
--
resources:
requests:
cpu: 200m # 注意观察,资源限制由默认的100m升级为200m啦~
memory: 300Mi # 注意观察,默认的内存也由100m升级为200m啦~
--
vpaUpdates: 'Pod resources updated by vpa-case: container 0: cpu request, memory
request'
creationTimestamp: "2025-09-25T03:03:37Z"
generateName: deploy-vpa-case-76d548698d-
--
resources:
requests:
cpu: 200m
memory: 300Mi
--
vpaUpdates: 'Pod resources updated by vpa-case: container 0: cpu request, memory
request'
creationTimestamp: "2025-09-25T03:02:37Z"
generateName: deploy-vpa-case-76d548698d-
--
resources:
requests:
cpu: 200m
memory: 300Mi
[root@master241 case-demo]#
3. 故障排查技巧
如果没有出现我上述的实验,建议查看组件的日志信息。
相关命令如下:
kubectl -n kube-system logs -f vpa-updater-7d6445968b-m7lpn
温馨提示:
Deployment控制器的副本数不建议设置为1,否则无法驱逐看不到实验的效果。
本文来自博客园,作者:尹正杰,转载请注明原文链接:https://www.cnblogs.com/yinzhengjie/p/19127675,个人微信: "JasonYin2020"(添加时请备注来源及意图备注,有偿付费)
当你的才华还撑不起你的野心的时候,你就应该静下心来学习。当你的能力还驾驭不了你的目标的时候,你就应该沉下心来历练。问问自己,想要怎样的人生。