2025,每天10分钟,跟我学K8S(四十七)- Prometheus(四)自动发现

        上一章节学习了Prometheus监控应用的流程,同时Prometheus也为我们提供了一个自动发现应用进行监控的方式。它有5种模式:Node、Service 、Pod、Endpoints、ingress。我们可以通过添加额外的配置来进行服务发现进行自动监控。不同的服务发现模式适用于不同的场景,例如:node适用于与主机相关的监控资源,如节点中运行的Kubernetes 组件状态、节点上运行的容器状态、节点自身状态等;service 和 ingress 适用于通过黑盒监控的场景,如对服务的可用性以及服务质量的监控;endpoints 和 pod 均可用于获取 Pod 实例的监控数据,如监控用户或者管理员部署的支持 Prometheus 的应用。本文就用Service 和Pod 举例。

        为解决服务发现的问题,kube-prometheus  为我们提供了一个额外的抓取配置来解决这个问题,我们可以通过添加额外的配置来进行服务发现进行自动监控。我们可以在 kube-prometheus 当中去自动发现并监控具有 prometheus.io/scrape=true 这个 annotations 的 Service。

官方配置项规则参考:https://prometheus.io/docs/prometheus/latest/configuration/configuration/

其中是通过 kubernetes_sd_configs 支持监控各种资源。kubernetes SD 配置允许从 kubernetes REST API 接收搜集指标,且总是和集群保持同步状态,任何一种 role 类型都能够配置来发现我们想要的对象。

规则配置使用 yaml 格式,比如下面是文件中一级配置项。自动发现 k8s Metrics 接口是通过 scrape_configs 来实现的:

#全局配置
global:

#规则配置主要是配置报警规则
rule_files:

#抓取配置,主要配置抓取客户端相关
scrape_configs:

#报警配置
alerting:

#用于远程存储写配置
remote_write:

#用于远程读配置
remote_read:

        之前搭建完 Prometheus 的时候,我们注意到 kube-apiserver 是自动就被监控到的。我们并没有手动去添加他,他是如何做到的,查看一下他的 serviceMonitor文件。

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: apiserver
    app.kubernetes.io/part-of: kube-prometheus
  name: kube-apiserver
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    metricRelabelings:

......

    - action: drop
      regex: apiserver_request_duration_seconds_bucket;(0.15|0.25|0.3|0.35|0.4|0.45|0.6|0.7|0.8|0.9|1.25|1.5|1.75|2.5|3|3.5|4.5|6|7|8|9|15|25|30|50)
      sourceLabels:
      - __name__
      - le
    port: https
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      serverName: kubernetes
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 5s
    metricRelabelings:
    - action: drop
      regex: process_start_time_seconds
      sourceLabels:
      - __name__
    path: /metrics/slis
    port: https
    scheme: https
    tlsConfig:
      caFile: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      serverName: kubernetes
  jobLabel: component
  namespaceSelector:
    matchNames:
    - default
  selector:
    matchLabels:
      component: apiserver
      provider: kubernetes

action的行为:
    replace:默认行为,不配置action的话就采用这种行为,它会根据regex来去匹配source_labels标签上的值,并将并将匹配到的值写入target_label中
    labelmap:它会根据regex去匹配标签名称,并将匹配到的内容作为新标签的名称,其值作为新标签的值
    keep:仅收集匹配到regex的源标签,而会丢弃没有匹配到的所有标签,用于选择
    drop:丢弃匹配到regex的源标签,而会收集没有匹配到的所有标签,用于排除
    labeldrop:使用regex匹配标签,符合regex规则的标签将从target实例中移除,其实也就是不收集不保存
    labelkeep:使用regex匹配标签,仅收集符合regex规则的标签,不符合的不收集

        上面的配置除了在serviceMonitor文件中可以看到,也可以在web页面的config页面查看到,那我们大概能明白是需要增加一个job_name,然后匹配不同规则的正则需求。这些不同对象的需求在官网上可以看到。接下来就实际操作新增pod的自动发现和svc的自动发现。

Pod的自动发现

1、创建prometheus-additional.yaml

vim  manifests/prometheus-additional.yaml

---
- job_name: kubernetes-pods
  kubernetes_sd_configs:
  - role: pod
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: kubernetes_namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: kubernetes_pod_name

2、生成secret 

生成secret ,命名为additional-configs-pod

#  kubectl create secret generic additional-configs --from-file=manifests/prometheus-additional.yaml -n monitoringsecret/additional-configs created

3、查看刚才创建的secret

查看刚才创建的secret generic additional-configs

# kubectl get secret additional-configs -n monitoring -o yaml
apiVersion: v1
data:
  prometheus-additional.yaml: LS0tCi0gam9iX25hbWU6IGt1YmVybmV0ZXMtcG9kcwogIGt1YmVybmV0ZXNfc2RfY29uZmlnczoKICAtIHJvbGU6IHBvZAogIHJlbGFiZWxfY29uZmlnczoKICAtIGFjdGlvbjoga2VlcAogICAgcmVnZXg6IHRydWUKICAgIHNvdXJjZV9sYWJlbHM6CiAgICAtIF9fbWV0YV9rdWJlcm5ldGVzX3BvZF9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fc2NyYXBlCiAgLSBhY3Rpb246IHJlcGxhY2UKICAgIHJlZ2V4OiAoLispCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX21ldGFfa3ViZXJuZXRlc19wb2RfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3BhdGgKICAgIHRhcmdldF9sYWJlbDogX19tZXRyaWNzX3BhdGhfXwogIC0gYWN0aW9uOiByZXBsYWNlCiAgICByZWdleDogKFteOl0rKSg/OjpcZCspPzsoXGQrKQogICAgcmVwbGFjZW1lbnQ6ICQxOiQyCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX2FkZHJlc3NfXwogICAgLSBfX21ldGFfa3ViZXJuZXRlc19wb2RfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3BvcnQKICAgIHRhcmdldF9sYWJlbDogX19hZGRyZXNzX18KICAtIGFjdGlvbjogbGFiZWxtYXAKICAgIHJlZ2V4OiBfX21ldGFfa3ViZXJuZXRlc19wb2RfbGFiZWxfKC4rKQogIC0gYWN0aW9uOiByZXBsYWNlCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX21ldGFfa3ViZXJuZXRlc19uYW1lc3BhY2UKICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19uYW1lc3BhY2UKICAtIGFjdGlvbjogcmVwbGFjZQogICAgc291cmNlX2xhYmVsczoKICAgIC0gX19tZXRhX2t1YmVybmV0ZXNfcG9kX25hbWUKICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19wb2RfbmFtZQo=
kind: Secret
metadata:
  creationTimestamp: "2025-04-15T03:18:34Z"
  name: additional-configs
  namespace: monitoring
  resourceVersion: "2195855"
  uid: b0138dc1-0071-432e-a6d0-12e2b9621c21
type: Opaque

4、编辑prometheus-prometheus.yaml

新增三行,对应的additionalScrapeConfigs配置

添加完成后,直接更新 prometheus 这个 CRD 资源对象:

# kubectl apply -f manifests/prometheus-prometheus.yaml 
prometheus.monitoring.coreos.com/k8s configured

5、查看web页面

等待几分钟,在prometheus dashboard 页面 status configuration页面查看刚才创建的 job_name: kubernetes-pods 是否有出来

6、验证

创建一个新的pod,以redis为例,来查看自动发现pod功能,是否启用

要想自动发现集群中的 pod,也需要我们在 pod 的annotation区域添加:prometheus.io/scrape=true的声明

# cat prome-redis.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: redis
#  namespace: kube-system
spec:
  selector:  #k8s 新版本必须有selector
    matchLabels:
      app: redis
  template:
    metadata:
      annotations:  #下面2栏是必须要的,只有包含这2栏,自动发现才可以找到
        prometheus.io/scrape: "true"
        prometheus.io/port: "9121"
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: m.daocloud.io/docker.io/redis
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 6379
      - name: redis-exporter
        image: m.daocloud.io/docker.io/oliver006/redis_exporter:latest
        resources:
          requests:
            cpu: 100m
            memory: 100Mi
        ports:
        - containerPort: 9121



# kubectl apply -f prome-redis.yaml 
deployment.apps/redis created

7、解决rbac的报错

但是此时切换到 targets 页面下面却并没有发现对应的监控任务,查看 Prometheus 的 Pod 日志,发现大量日志权限问题的报错

# kubectl logs -f prometheus-k8s-0 prometheus -n monitoring

ts=2025-04-15T03:26:04.637Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"
ts=2025-04-15T03:26:04.637Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.29.3/tools/cache/reflector.go:229: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"

这说明是 RBAC 权限的问题,通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s 的 ServiceAccount 对象,而这个对象绑定的是一个名为 prometheus-k8s 的 ClusterRole:(prometheus-clusterRole.yaml)

原本的权限如下:

修改后如下:

cat prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get


# kubectl apply -f  manifests/prometheus-clusterRole.yaml 
clusterrole.rbac.authorization.k8s.io/prometheus-k8s configured

再次去在 Prometheus Dashboard的targets页面,已经可以发现刚才监控的redis监控项了。

service的自动发现

service通过selector和pod建立关联。k8s会根据service关联到pod的podIP信息组合成一个endpoint。

若service定义中没有selector字段,service被创建时,endpoint controller不会自动创建endpoint。所以监控endpoint即可监控service

 1、新增redis的service

给上面创建的redis创建service

vim prome-redis-service.yaml
kind: Service
apiVersion: v1
metadata:  #下面3行必须包
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9121"
  name: redis
#  namespace: kube-system
spec:
  selector:  #必须有selector
    app: redis
  ports:
  - name: redis
    port: 6379
    targetPort: 6379
  - name: metrics  #由于上面选择的是port进行对应,所以这里名称随意
    port: 9121
    targetPort: 9121
    
kubectl apply -f prome-redis-service.yaml

2、新增svc的job_name

修改prometheus-additional.yaml  注意新增加的job不能用3横线

vim prometheus-additional.yaml
# 新增下面内容  注意此处不能用3横线,否则无法识别下面内容
- job_name: kubernetes-service-endpoints
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scrape
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_service_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_service_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: kubernetes_namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_service_name
    target_label: kubernetes_name

 

3、重新应用secret

删除刚才生成secret ,再次重新创建一个

# kubectl delete secret -n monitoring additional-configs
secret "additional-configs" deleted
# kubectl create secret generic additional-configs  --from-file=manifests/prometheus-additional.yaml -n monitoring
secret/additional-configs created

4、验证secret

查看刚才创建的secret generic additional-configs

# kubectl get secret additional-configs -n monitoring -o yaml
apiVersion: v1
data:
  prometheus-additional.yaml: LS0tCi0gam9iX25hbWU6IGt1YmVybmV0ZXMtcG9kcwogIGt1YmVybmV0ZXNfc2RfY29uZmlnczoKICAtIHJvbGU6IHBvZAogIHJlbGFiZWxfY29uZmlnczoKICAtIGFjdGlvbjoga2VlcAogICAgcmVnZXg6IHRydWUKICAgIHNvdXJjZV9sYWJlbHM6CiAgICAtIF9fbWV0YV9rdWJlcm5ldGVzX3BvZF9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fc2NyYXBlCiAgLSBhY3Rpb246IHJlcGxhY2UKICAgIHJlZ2V4OiAoLispCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX21ldGFfa3ViZXJuZXRlc19wb2RfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3BhdGgKICAgIHRhcmdldF9sYWJlbDogX19tZXRyaWNzX3BhdGhfXwogIC0gYWN0aW9uOiByZXBsYWNlCiAgICByZWdleDogKFteOl0rKSg/OjpcZCspPzsoXGQrKQogICAgcmVwbGFjZW1lbnQ6ICQxOiQyCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX2FkZHJlc3NfXwogICAgLSBfX21ldGFfa3ViZXJuZXRlc19wb2RfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3BvcnQKICAgIHRhcmdldF9sYWJlbDogX19hZGRyZXNzX18KICAtIGFjdGlvbjogbGFiZWxtYXAKICAgIHJlZ2V4OiBfX21ldGFfa3ViZXJuZXRlc19wb2RfbGFiZWxfKC4rKQogIC0gYWN0aW9uOiByZXBsYWNlCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX21ldGFfa3ViZXJuZXRlc19uYW1lc3BhY2UKICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19uYW1lc3BhY2UKICAtIGFjdGlvbjogcmVwbGFjZQogICAgc291cmNlX2xhYmVsczoKICAgIC0gX19tZXRhX2t1YmVybmV0ZXNfcG9kX25hbWUKICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19wb2RfbmFtZQotIGpvYl9uYW1lOiBrdWJlcm5ldGVzLXNlcnZpY2UtZW5kcG9pbnRzCiAga3ViZXJuZXRlc19zZF9jb25maWdzOgogIC0gcm9sZTogZW5kcG9pbnRzCiAgcmVsYWJlbF9jb25maWdzOgogIC0gYWN0aW9uOiBrZWVwCiAgICByZWdleDogdHJ1ZQogICAgc291cmNlX2xhYmVsczoKICAgIC0gX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fc2NyYXBlCiAgLSBhY3Rpb246IHJlcGxhY2UKICAgIHJlZ2V4OiAoaHR0cHM/KQogICAgc291cmNlX2xhYmVsczoKICAgIC0gX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9hbm5vdGF0aW9uX3Byb21ldGhldXNfaW9fc2NoZW1lCiAgICB0YXJnZXRfbGFiZWw6IF9fc2NoZW1lX18KICAtIGFjdGlvbjogcmVwbGFjZQogICAgcmVnZXg6ICguKykKICAgIHNvdXJjZV9sYWJlbHM6CiAgICAtIF9fbWV0YV9rdWJlcm5ldGVzX3NlcnZpY2VfYW5ub3RhdGlvbl9wcm9tZXRoZXVzX2lvX3BhdGgKICAgIHRhcmdldF9sYWJlbDogX19tZXRyaWNzX3BhdGhfXwogIC0gYWN0aW9uOiByZXBsYWNlCiAgICByZWdleDogKFteOl0rKSg/OjpcZCspPzsoXGQrKQogICAgcmVwbGFjZW1lbnQ6ICQxOiQyCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX2FkZHJlc3NfXwogICAgLSBfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX2Fubm90YXRpb25fcHJvbWV0aGV1c19pb19wb3J0CiAgICB0YXJnZXRfbGFiZWw6IF9fYWRkcmVzc19fCiAgLSBhY3Rpb246IGxhYmVsbWFwCiAgICByZWdleDogX19tZXRhX2t1YmVybmV0ZXNfc2VydmljZV9sYWJlbF8oLispCiAgLSBhY3Rpb246IHJlcGxhY2UKICAgIHNvdXJjZV9sYWJlbHM6CiAgICAtIF9fbWV0YV9rdWJlcm5ldGVzX25hbWVzcGFjZQogICAgdGFyZ2V0X2xhYmVsOiBrdWJlcm5ldGVzX25hbWVzcGFjZQogIC0gYWN0aW9uOiByZXBsYWNlCiAgICBzb3VyY2VfbGFiZWxzOgogICAgLSBfX21ldGFfa3ViZXJuZXRlc19zZXJ2aWNlX25hbWUKICAgIHRhcmdldF9sYWJlbDoga3ViZXJuZXRlc19uYW1lCg==
kind: Secret
metadata:
  creationTimestamp: "2025-04-15T05:26:34Z"
  name: additional-configs
  namespace: monitoring
  resourceVersion: "2286438"
  uid: cedfc809-a8b3-4006-b432-651d47f7a366
type: Opaque

添加完成后,等待即可,如果一直不出来,可以删除重新更新,但是不建议这样做。直接更新 prometheus 这个 CRD 资源对象:

kubectl delete -f prometheus-prometheus.yaml
kubectl apply -f prometheus-prometheus.yaml
prometheus.monitoring.coreos.com/k8s configured

5、查看web页面

等待几分钟,在prometheus dashboard 页面 status configuration页面查看刚才创建的 job_name: kubernetes-service是否有出来

 

再去监控页面查看是有有这个监控项

至此,service的自动发现也做完了,从图例看出来,pod自动发现和service自动发现重复了。一般监控一种就可以了

总结:

不管是自动发现pod还是service,大概步骤分为以下几步

1.在prometheus-additional.yaml针对不同对象新增的job_name

2.将additional-configs的secret删除并重新生成

# 删除secret 
# kubectl delete secret -n monitoring additional-configs

# 重新生成secret 
# kubectl create secret generic additional-configs  --from-file=manifests/prometheus-additional.yaml -n monitoring

3.  编辑prometheus-prometheus.yaml,将刚才生成的secret挂载进去,并重新apply (只用一次)

4. 出现权限报错,重新创建rbac,赋予权限 (只用一次)

5.创建对应的pod或者svc的yaml,新增对应的注解,如果是pod的yaml文件需要通过旁挂一个sidecar的形式提供应用的exporter服务,暴露/metrics接口。这种sidecar的形式我们将在日志章节单独讲解。

# 如果是pod,新增pod的注解
    metadata:
      annotations:  #下面2栏是必须要的,只有包含这2栏,自动发现才可以找到
        prometheus.io/scrape: "true"
        prometheus.io/port: "9121"

# 如果是svc,新增svc的注解
metadata:  #下面3行必须包
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "9121"

posted @ 2025-04-15 14:28  Devopser06  阅读(23)  评论(0)    收藏  举报  来源