k8s集群监控实践

简介

本文介绍了自建k8s集群的监控实践,采用Prometheus、Node Exporter、Grafana等组件实现对集群的监控。

Prometheus安装

一开始我在Prometheus的安装方案上产生了纠结。

  • 方案一:采用Prometheus Operator部署
    • 优点:可以通过yaml资源来操作Prometheus,自动化、简化Prometheus配置
    • 缺点:学习成本高,需要学习Operator相关的资源配置;设计中没有NodeMonitor,对Kurbernets cAdvisor进行监控实现困难
  • 方案二:采用Deployment+ConfigMap部署
    • 优点:配置直观,没有黑魔法
    • 缺点:资源文件全都需要自己写,更新配置需要额外的组件,需要引入config reloader来更新Prometheus配置。
      最终采用了方案二的安装方式,但未实现配置更新后自动重启Prometheus。使用的是Agent+中心存储的方式,即集群中使用Prometheus Agent模式,它本身并不存储数据,但会抓取数据并推送到远程服务器存储。

资源文件

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: prometheus

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
  namespace: prometheus
rules:
- apiGroups: [""]
  resources:
  - nodes
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs: ["get", "list", "watch"]
- apiGroups: [""]
  resources:
  - configmaps
  verbs: ["get"]
- apiGroups:
  - discovery.k8s.io
  resources:
  - endpointslices
  verbs: ["get", "list", "watch"]
- apiGroups:
  - networking.k8s.io
  resources:
  - ingresses
  verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
  verbs: ["get"]

---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
  namespace: prometheus
  labels:
    app: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: prometheus

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-write
  namespace: prometheus
spec:
  selector:
    matchLabels:
      app: prometheus-write
  replicas: 1
  template:
    metadata:
      labels:
        app: prometheus-write
    spec:
      containers:
      - name: prometheus
        image: xxx/premetheus:latest #镜像源
        ports:
        - containerPort: 9090
          name: web-port
        volumeMounts:
        - name: config-volume
          mountPath: /etc/prometheus
        - name: cache-volume
          mountPath: /prometheus
        args:
          - "--config.file=/etc/prometheus/prometheus.yml"
          - "--agent"
        resources:
          requests:
            memory: "125Mi"
            cpu: "60m"
          limits:
            memory: "200Mi"
            cpu: "2000m"
        livenessProbe: #存活探针,判断容器是否健康
          httpGet:
            path: /-/healthy
            port: web-port
        readinessProbe: #就绪探针,判断容器是否已准备好接收流量
          httpGet:
            path: /-/ready
            port: web-port
      volumes:
        - name: config-volume
          configMap:
            name: prometheus-config
        - name: cache-volume
          emptyDir:
            sizeLimit: 500Mi
      serviceAccountName: prometheus
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-write-svc
spec:
  clusterIP: None
  selector:
    app: prometheus-write
  ports:
  - port: 9090
    targetPort: 9090

prometheus.yaml文件配置

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "kubernetes-cadvisor"

    # Default to scraping over https. If required, just disable this or change to
    # `http`.
    scheme: https

    # Starting Kubernetes 1.7.3 the cAdvisor metrics are under /metrics/cadvisor.
    # Kubernetes CIS Benchmark recommends against enabling the insecure HTTP
    # servers of Kubernetes, therefore the cAdvisor metrics on the secure handler
    # are used.
    metrics_path: /metrics/cadvisor

    # This TLS & authorization config is used to connect to the actual scrape
    # endpoints for cluster components. This is separate to discovery auth
    # configuration because discovery & scraping are two separate concerns in
    # Prometheus. The discovery auth config is automatic if Prometheus runs inside
    # the cluster. Otherwise, more config options have to be provided within the
    # <kubernetes_sd_config>.
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      # If your node certificates are self-signed or use a different CA to the
      # master CA, then disable certificate verification below. Note that
      # certificate verification is an integral part of a secure infrastructure
      # so this should only be disabled in a controlled environment. You can
      # disable certificate verification by uncommenting the line below.
      #
      # insecure_skip_verify: true
    authorization:
      credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token

    kubernetes_sd_configs:
      - role: node
    honor_timestamps: false
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

  - job_name: "java-app"
    # Default to scraping over https. If required, just disable this or change to
    # `http`.
    scheme: http
    metrics_path: /assesManageApi/actuator/prometheus
    kubernetes_sd_configs:
      - role: pod
    basic_auth:
      username: admin
      password: Hzqisheng123.
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        action: keep
        regex: asset-management
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: namespace
  - job_name: "node-exporter"
    # Default to scraping over https. If required, just disable this or change to
    # `http`.
    scheme: https
    metrics_path: /metrics
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name,__meta_kubernetes_pod_container_port_number]
        action: keep
        regex: node-exporter;9100
    tls_config:
      insecure_skip_verify: true
    authorization:
      credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
#配置远程服务
remote_write:
- url: https://prometheus.zhihuipk.com/api/v1/write
  name: center-prometheus
  basic_auth:
    username: xxx
    password: xxx

Node Exporter安装

Node Exporter的安装参考kube-prometheus中的关于Node Exporter部分的配置文件
image

Grafana

安装参考的是官网安装
使用的面板:

  • Node Exporter 1860
  • Spring Boot 3.0 19004
  • Kubernetes Monitoring Dashboard 12740
posted @ 2026-01-30 17:41  Hekk丶  阅读(6)  评论(0)    收藏  举报