prometheus+grafana监控k8s资源
kube-state-metrics是什么?
如需监控k8s比较全面的资源指标,需要在集群内安装相应的exports,例如:CAdvisor,kube-state-metrics
-
CAdvisor: 已集成在kubelet内,不需要单独安装,它可以收集集群内容器的cpu,内存等指标;
-
kube-state-metrics: kube-state-metrics可以轮询api-server,可以监听 add、delete、update等事件,如仅有CAdvisor这些基本指标去监控,维度还是不够的,例如:对Deployment,Pod,Daemonset,Cronjob等k8s资源对象并没有监控,例如:replace是多少?Pod当前状态(pending or running?),CAdvisor并没有对具体的资源对象就行监控,因此就需引用新的exports来暴漏监控指标,这个exports就是kube-state-metrics;
-
kube-state-metrics关注于获取k8s各种资源的最新状态,如deployment或者daemonset,之所以没有把kube-state-metrics纳入到metric-server的能力中,是因为它们的关注点本质上是不一样的。metric-server仅仅是获取、格式化现有数据,写入特定的存储,实质上是一个监控系统;而kube-state-metrics是将k8s的运行状况在内存中做了个快照,并且获取新的指标,但它没有能力导出这些指标。
kube-state-metrics部署
kube-state-metrics版本和k8s版本需要兼容,可具体查看github地址:https://github.com/kubernetes/kube-state-metrics/tree/main, 我的k8s版本为1.23.9版本,下载 kube-state-metrics 版本为2.2.0版本
tar -xf kube-state-metrics-2.2.0.tar.gz
cd kube-state-metrics-2.2.0/examples/standard/
# 修改svc为nodeport类型:
# 8080 端口返回的内容就是各类 Kubernetes 对象信息,比如 node 相关的信息
# 8081 端口,暴露的是 KSM 自身的指标,KSM 要调用 APIServer 的接口,watch 相关数据,需要度量这些动作的健康状况
cat service.yaml
apiVersion: v1
kind: Service
metadata:
labels:
app.kubernetes.io/name: kube-state-metrics
app.kubernetes.io/version: 2.2.0
name: kube-state-metrics
namespace: kube-system
spec:
type: NodePort
ports:
- name: http-metrics
port: 8080
targetPort: http-metrics
nodePort: 30860
- name: telemetry
port: 8081
targetPort: telemetry
nodePort: 30861
selector:
app.kubernetes.io/name: kube-state-metrics
# 开始执行部署,默认是部署在kube-system命名空间下,如需改变可以自行更改各个yaml文件
kubectl apply -f .
# 如果镜像拉取失败:
docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.2.0
docker tag swr.cn-north-4.myhuaweicloud.com/ddn-k8s/k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.2.0 k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.2.0
测试是否正常:
curl http://127.0.0.1:30861/healthz
<html>
<head><title>Kube-State-Metrics Metrics Server</title></head>
<body>
<h1>Kube-State-Metrics Metrics</h1>
<ul>
<li><a href='/metrics'>metrics</a></li>
</ul>
</body>
</html>
curl http://127.0.0.1:30860/healthz
OK
配置prometheus拉取指标
创建k8s.token文件
# 创建sectet
cat kube-state-metrics-token.yaml
apiVersion: v1
kind: Secret
metadata:
name: kube-state-metrics
namespace: kube-system
annotations:
kubernetes.io/service-account.name: kube-state-metrics
type: kubernetes.io/service-account-token
kubectl apply -f kube-state-metrics-token.yaml
# 查看token
kubectl describe secret kube-state-metrics -n kube-system
Name: kube-state-metrics
Namespace: kube-system
Labels: <none>
Annotations: kubernetes.io/service-account.name: kube-state-metrics
kubernetes.io/service-account.uid: 4dc9afc8-c427-4300-bf38-ec7a6c0a5560
Type: kubernetes.io/service-account-token
Data
====
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IkZKaVhObFZCeHZIYndIY084cWRmLS1Xam1mbkZTRHZMV3JHYm5lZFZKSGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlLXN0YXRlLW1ldHJpY3MiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoia3ViZS1zdGF0ZS1tZXRyaWNzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNGRjOWFmYzgtYzQyNy00MzAwLWJmMzgtZWM3YTZjMGE1NTYwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmt1YmUtc3RhdGUtbWV0cmljcyJ9.5OforSvZKdfWZpQfg5vbrcoXn5VhpPpaziKpFNY0VDrsaMziMZEZWTV9OMandE9nbLHdSBgMK-i2JV-4_488Aw5UQSFMJFdGn_AVJ6hdaDiXFQpvzIqNwHswMioMgyWjUBsmGlN1qkLUo0P3TIo8_-Rc4obzaB5qSMzQKMlxMW6MntiKLxrmlMcMODzVJj3A0F6pbB7YOqWU33bcDRsVb9vzQwO-9DWEDWIGkBaeTs8ZXf8ecuyN0Vr0PkAQKEA8Xn84dKsCunkTN-6FmL-eGba-TBinNhBbApc6zztz5fLWXrRtiZQwYSkIHC0KnpKQtxBCxDjZsSstOhBTc2ItYg
ca.crt: 1099 bytes
namespace: 11 bytes
# 将此token写入到k8s.token中传输到prometheus机器,prometheus需要调用此文件
cat /data/prometheus/k8s.token
eyJhbGciOiJSUzI1NiIsImtpZCI6IkZKaVhObFZCeHZIYndIY084cWRmLS1Xam1mbkZTRHZMV3JHYm5lZFZKSGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlLXN0YXRlLW1ldHJpY3MiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoia3ViZS1zdGF0ZS1tZXRyaWNzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNGRjOWFmYzgtYzQyNy00MzAwLWJmMzgtZWM3YTZjMGE1NTYwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmt1YmUtc3RhdGUtbWV0cmljcyJ9.5OforSvZKdfWZpQfg5vbrcoXn5VhpPpaziKpFNY0VDrsaMziMZEZWTV9OMandE9nbLHdSBgMK-i2JV-4_488Aw5UQSFMJFdGn_AVJ6hdaDiXFQpvzIqNwHswMioMgyWjUBsmGlN1qkLUo0P3TIo8_-Rc4obzaB5qSMzQKMlxMW6MntiKLxrmlMcMODzVJj3A0F6pbB7YOqWU33bcDRsVb9vzQwO-9DWEDWIGkBaeTs8ZXf8ecuyN0Vr0PkAQKEA8Xn84dKsCunkTN-6FmL-eGba-TBinNhBbApc6zztz5fLWXrRtiZQwYSkIHC0KnpKQtxBCxDjZsSstOhBTc2ItYg
添加监控配置
vim /data/prometheus/prometheus.yml
- job_name: 'k8s-cadvisor'
scrape_interval: 60s
scrape_timeout: 60s
metrics_path: /metrics/cadvisor
kubernetes_sd_configs: # kubernetes 自动发现
- api_server: https://10.0.0.41:6443 # apiserver 地址
role: node # node 类型的自动发现
namespaces:
names:
- kube-system
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:10255'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
metric_relabel_configs:
- source_labels: [instance]
separator: ;
regex: (.+)
target_label: node
replacement: $1
action: replace
- source_labels: [pod_name]
separator: ;
regex: (.+)
target_label: pod
replacement: $1
action: replace
- source_labels: [container_name]
separator: ;
regex: (.+)
target_label: container
replacement: $1
action: replace
- job_name: kube-state-metrics-1
kubernetes_sd_configs:
- api_server: https://10.0.0.41:6443 # apiserver 地址
role: endpoints # 端点类型的自动发现
namespaces:
names:
- kube-system
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- separator: ;
regex: (.*)
target_label: __address__
replacement: 10.0.0.41:30860
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
regex: kube-state-metrics
replacement: $1
action: keep
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: k8s_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: k8s_sname
- job_name: kube-state-metrics-2
kubernetes_sd_configs:
- api_server: https://10.0.0.41:6443 # apiserver 地址
role: endpoints # 端点类型的自动发现
namespaces:
names:
- kube-system
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
bearer_token_file: k8s.token
tls_config:
insecure_skip_verify: true
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- separator: ;
regex: (.*)
target_label: __address__
replacement: 10.0.0.41:30861
- source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
regex: kube-state-metrics
replacement: $1
action: keep
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: k8s_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: k8s_sname
热加载prometheus:
curl -X POST http://localhost:9090/-/reload
目前的状态:
这是因为kubelet的10255默认是关闭的
10250(kubelet API):是kubelet与 API Server通信的端口,定期请求 API Server获取自己所应当处理的任务,通过该端口可以访问获取node资源以及状态。
10255(readonly API):提供了pod和node的信息。如果不开启10255端口,将会拿不到容器相关的监控指标
修改kubelet配置开启10255端口
所有的k8s节点都需要开启
# 在后面添加--read-only-port=10255参数
cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.6 --read-only-port=10255"
systemctl restart kubelet
netstat -lntup |grep 10255
tcp6 0 0 :::10255 :::* LISTEN 113973/kubelet
开启之后再次查看prometheus web界面的状态:
添加grafana仪表盘
导入模板:13105