prometheus+grafana监控k8s资源

kube-state-metrics是什么?

如需监控k8s比较全面的资源指标,需要在集群内安装相应的exports,例如:CAdvisor,kube-state-metrics

  • CAdvisor: 已集成在kubelet内,不需要单独安装,它可以收集集群内容器的cpu,内存等指标;

  • kube-state-metrics: kube-state-metrics可以轮询api-server,可以监听 add、delete、update等事件,如仅有CAdvisor这些基本指标去监控,维度还是不够的,例如:对Deployment,Pod,Daemonset,Cronjob等k8s资源对象并没有监控,例如:replace是多少?Pod当前状态(pending or running?),CAdvisor并没有对具体的资源对象就行监控,因此就需引用新的exports来暴漏监控指标,这个exports就是kube-state-metrics;

  • kube-state-metrics关注于获取k8s各种资源的最新状态,如deployment或者daemonset,之所以没有把kube-state-metrics纳入到metric-server的能力中,是因为它们的关注点本质上是不一样的。metric-server仅仅是获取、格式化现有数据,写入特定的存储,实质上是一个监控系统;而kube-state-metrics是将k8s的运行状况在内存中做了个快照,并且获取新的指标,但它没有能力导出这些指标。

kube-state-metrics部署

kube-state-metrics版本和k8s版本需要兼容,可具体查看github地址:https://github.com/kubernetes/kube-state-metrics/tree/main, 我的k8s版本为1.23.9版本,下载 kube-state-metrics 版本为2.2.0版本

tar -xf kube-state-metrics-2.2.0.tar.gz
cd kube-state-metrics-2.2.0/examples/standard/

# 修改svc为nodeport类型:
# 8080 端口返回的内容就是各类 Kubernetes 对象信息,比如 node 相关的信息
# 8081 端口,暴露的是 KSM 自身的指标,KSM 要调用 APIServer 的接口,watch 相关数据,需要度量这些动作的健康状况
cat service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/name: kube-state-metrics
    app.kubernetes.io/version: 2.2.0
  name: kube-state-metrics
  namespace: kube-system
spec:
  type: NodePort
  ports:
  - name: http-metrics
    port: 8080
    targetPort: http-metrics
    nodePort: 30860
  - name: telemetry
    port: 8081
    targetPort: telemetry
    nodePort: 30861
  selector:
    app.kubernetes.io/name: kube-state-metrics

# 开始执行部署,默认是部署在kube-system命名空间下,如需改变可以自行更改各个yaml文件
kubectl apply -f . 

# 如果镜像拉取失败:
docker pull swr.cn-north-4.myhuaweicloud.com/ddn-k8s/k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.2.0
docker tag  swr.cn-north-4.myhuaweicloud.com/ddn-k8s/k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.2.0  k8s.gcr.io/kube-state-metrics/kube-state-metrics:v2.2.0

测试是否正常:

curl http://127.0.0.1:30861/healthz
<html>
             <head><title>Kube-State-Metrics Metrics Server</title></head>
             <body>
             <h1>Kube-State-Metrics Metrics</h1>
			 <ul>
             <li><a href='/metrics'>metrics</a></li>
			 </ul>
             </body>
             </html>

curl http://127.0.0.1:30860/healthz
OK

配置prometheus拉取指标

创建k8s.token文件

# 创建sectet
cat kube-state-metrics-token.yaml
apiVersion: v1
kind: Secret
metadata:
  name: kube-state-metrics
  namespace: kube-system
  annotations:
    kubernetes.io/service-account.name: kube-state-metrics
type: kubernetes.io/service-account-token

kubectl apply -f kube-state-metrics-token.yaml

# 查看token
kubectl describe secret kube-state-metrics -n kube-system
Name:         kube-state-metrics
Namespace:    kube-system
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: kube-state-metrics
              kubernetes.io/service-account.uid: 4dc9afc8-c427-4300-bf38-ec7a6c0a5560

Type:  kubernetes.io/service-account-token

Data
====
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IkZKaVhObFZCeHZIYndIY084cWRmLS1Xam1mbkZTRHZMV3JHYm5lZFZKSGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlLXN0YXRlLW1ldHJpY3MiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoia3ViZS1zdGF0ZS1tZXRyaWNzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNGRjOWFmYzgtYzQyNy00MzAwLWJmMzgtZWM3YTZjMGE1NTYwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmt1YmUtc3RhdGUtbWV0cmljcyJ9.5OforSvZKdfWZpQfg5vbrcoXn5VhpPpaziKpFNY0VDrsaMziMZEZWTV9OMandE9nbLHdSBgMK-i2JV-4_488Aw5UQSFMJFdGn_AVJ6hdaDiXFQpvzIqNwHswMioMgyWjUBsmGlN1qkLUo0P3TIo8_-Rc4obzaB5qSMzQKMlxMW6MntiKLxrmlMcMODzVJj3A0F6pbB7YOqWU33bcDRsVb9vzQwO-9DWEDWIGkBaeTs8ZXf8ecuyN0Vr0PkAQKEA8Xn84dKsCunkTN-6FmL-eGba-TBinNhBbApc6zztz5fLWXrRtiZQwYSkIHC0KnpKQtxBCxDjZsSstOhBTc2ItYg
ca.crt:     1099 bytes
namespace:  11 bytes

# 将此token写入到k8s.token中传输到prometheus机器,prometheus需要调用此文件
cat /data/prometheus/k8s.token
eyJhbGciOiJSUzI1NiIsImtpZCI6IkZKaVhObFZCeHZIYndIY084cWRmLS1Xam1mbkZTRHZMV3JHYm5lZFZKSGcifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlLXN5c3RlbSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJrdWJlLXN0YXRlLW1ldHJpY3MiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoia3ViZS1zdGF0ZS1tZXRyaWNzIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiNGRjOWFmYzgtYzQyNy00MzAwLWJmMzgtZWM3YTZjMGE1NTYwIiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmUtc3lzdGVtOmt1YmUtc3RhdGUtbWV0cmljcyJ9.5OforSvZKdfWZpQfg5vbrcoXn5VhpPpaziKpFNY0VDrsaMziMZEZWTV9OMandE9nbLHdSBgMK-i2JV-4_488Aw5UQSFMJFdGn_AVJ6hdaDiXFQpvzIqNwHswMioMgyWjUBsmGlN1qkLUo0P3TIo8_-Rc4obzaB5qSMzQKMlxMW6MntiKLxrmlMcMODzVJj3A0F6pbB7YOqWU33bcDRsVb9vzQwO-9DWEDWIGkBaeTs8ZXf8ecuyN0Vr0PkAQKEA8Xn84dKsCunkTN-6FmL-eGba-TBinNhBbApc6zztz5fLWXrRtiZQwYSkIHC0KnpKQtxBCxDjZsSstOhBTc2ItYg

添加监控配置

vim /data/prometheus/prometheus.yml
  - job_name: 'k8s-cadvisor'
    scrape_interval: 60s
    scrape_timeout: 60s
    metrics_path: /metrics/cadvisor
    kubernetes_sd_configs:  # kubernetes 自动发现
    - api_server: https://10.0.0.41:6443  # apiserver 地址
      role: node  # node 类型的自动发现
      namespaces:
        names:
        - kube-system
      bearer_token_file: k8s.token
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: k8s.token
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - source_labels: [__address__]
      regex: '(.*):10250'
      replacement: '${1}:10255'
      target_label: __address__
      action: replace
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    metric_relabel_configs:
    - source_labels: [instance]
      separator: ;
      regex: (.+)
      target_label: node
      replacement: $1
      action: replace
    - source_labels: [pod_name]
      separator: ;
      regex: (.+)
      target_label: pod
      replacement: $1
      action: replace
    - source_labels: [container_name]
      separator: ;
      regex: (.+)
      target_label: container
      replacement: $1
      action: replace
  - job_name: kube-state-metrics-1
    kubernetes_sd_configs:
    - api_server: https://10.0.0.41:6443  # apiserver 地址
      role: endpoints  # 端点类型的自动发现
      namespaces:
        names:
        - kube-system
      bearer_token_file: k8s.token
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: k8s.token
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - separator: ;
      regex: (.*)
      target_label: __address__
      replacement: 10.0.0.41:30860
    - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
      regex: kube-state-metrics
      replacement: $1
      action: keep
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: k8s_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: k8s_sname
  - job_name: kube-state-metrics-2
    kubernetes_sd_configs:
    - api_server: https://10.0.0.41:6443  # apiserver 地址
      role: endpoints  # 端点类型的自动发现
      namespaces:
        names:
        - kube-system
      bearer_token_file: k8s.token
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: k8s.token
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.+)
    - separator: ;
      regex: (.*)
      target_label: __address__
      replacement: 10.0.0.41:30861
    - source_labels: [__meta_kubernetes_service_label_app_kubernetes_io_name]
      regex: kube-state-metrics
      replacement: $1
      action: keep
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    - source_labels: [__meta_kubernetes_namespace]
      action: replace
      target_label: k8s_namespace
    - source_labels: [__meta_kubernetes_service_name]
      action: replace
      target_label: k8s_sname

热加载prometheus:

curl -X POST http://localhost:9090/-/reload

目前的状态:
image
这是因为kubelet的10255默认是关闭的

10250(kubelet API):是kubelet与 API Server通信的端口,定期请求 API Server获取自己所应当处理的任务,通过该端口可以访问获取node资源以及状态。

10255(readonly API):提供了pod和node的信息。如果不开启10255端口,将会拿不到容器相关的监控指标

修改kubelet配置开启10255端口

所有的k8s节点都需要开启

# 在后面添加--read-only-port=10255参数
cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--network-plugin=cni --pod-infra-container-image=k8s.gcr.io/pause:3.6 --read-only-port=10255"

systemctl restart kubelet

netstat -lntup |grep 10255
tcp6       0      0 :::10255                :::*                    LISTEN      113973/kubelet

开启之后再次查看prometheus web界面的状态:
image

添加grafana仪表盘

导入模板:13105
image

posted @ 2025-07-30 17:14  阿峰博客站  阅读(136)  评论(0)    收藏  举报