prometheus在k8s上的部署及添加非集群节点的监控
1. 准备阶段
-
集群环境:二进制部署的 K8s 集群(v1.34.2),未使用云服务商,节点无公网 IP。
-
私有镜像仓库:自建 Nexus(或 Harbor),地址
192.168.0.122:1443,HTTP 协议需配置 insecure 访问。 -
离线 Chart 包:下载
kube-prometheus-stack-82.10.3.tgz并上传到部署节点。https://github.com/prometheus-community/helm-charts/releases/download/kube-prometheus-stack-82.10.3/kube-prometheus-stack-82.10.3.tgz
2. 镜像准备(离线环境核心)
-
获取完整镜像列表,通过
helm template提取所有依赖的镜像(包括主 Chart 和子 Chart),得到images.txt。helm template prometheus ./kube-prometheus-stack-82.10.3.tgz --namespace monitoring | grep "image:" | awk '{print $2}' | sed 's/^"//;s/"$//' | sort -u > all-images.txt -
拉取、重打标签、推送
在有网络的机器上拉取原始镜像,按私有仓库地址重新打标签,然后推送。busybox的镜像要补全为192.168.0.122:1443/library/busybox:1.37.0,pod会自动补全导致镜像拉取失败。docker pull docker.io/busybox:1.37 docker pull registry.k8s.io/kubectl:v1.34.2 docker pull quay.io/prometheus/alertmanager:v0.31.1 docker pull quay.io/prometheus-operator/admission-webhook:v0.89.0 docker pull ghcr.io/jkroepke/kube-webhook-certgen:1.7.8 docker pull quay.io/prometheus-operator/prometheus-operator:v0.89.0 docker pull quay.io/prometheus-operator/prometheus-config-reloader:v0.89.0 docker pull quay.io/thanos/thanos:v0.41.0 docker pull quay.io/prometheus/prometheus:v3.10.0 docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.18.0 docker pull quay.io/prometheus/node-exporter:v1.10.2 docker pull quay.io/kiwigrid/k8s-sidecar:2.5.0 docker pull quay.io/brancz/kube-rbac-proxy:v0.14.0 docker pull docker.io/jimmidyson/configmap-reload:v0.8.0 docker tag docker.io/busybox:1.37 192.168.0.122:1443/library/busybox:1.37.0 docker tag registry.k8s.io/kubectl:v1.34.2 192.168.0.122:1443/kubectl:v1.34.2 docker tag quay.io/prometheus/alertmanager:v0.31.1 192.168.0.122:1443/prometheus/alertmanager:v0.31.1 docker tag quay.io/prometheus-operator/admission-webhook:v0.89.0 192.168.0.122:1443/prometheus-operator/admission-webhook:v0.89.0 docker tag ghcr.io/jkroepke/kube-webhook-certgen:1.7.8 192.168.0.122:1443/jkroepke/kube-webhook-certgen:1.7.8 docker tag quay.io/prometheus-operator/prometheus-operator:v0.89.0 192.168.0.122:1443/prometheus-operator/prometheus-operator:v0.89.0 docker tag quay.io/prometheus-operator/prometheus-config-reloader:v0.89.0 192.168.0.122:1443/prometheus-operator/prometheus-config-reloader:v0.89.0 docker tag quay.io/thanos/thanos:v0.41.0 192.168.0.122:1443/thanos/thanos:v0.41.0 docker tag quay.io/prometheus/prometheus:v3.10.0 192.168.0.122:1443/prometheus/prometheus:v3.10.0 docker tag registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.18.0 192.168.0.122:1443/kube-state-metrics/kube-state-metrics:v2.18.0 docker tag quay.io/prometheus/node-exporter:v1.10.2 192.168.0.122:1443/prometheus/node-exporter:v1.10.2 docker tag docker.io/grafana/grafana:12.4.1 192.168.0.122:1443/grafana/grafana:12.4.1 docker tag quay.io/kiwigrid/k8s-sidecar:2.5.0 192.168.0.122:1443/kiwigrid/k8s-sidecar:2.5.0 docker tag quay.io/brancz/kube-rbac-proxy:v0.14.0 192.168.0.122:1443/brancz/kube-rbac-proxy:v0.14.0 docker tag docker.io/jimmidyson/configmap-reload:v0.8.0 192.168.0.122:1443/jimmidyson/configmap-reload:v0.8.0 docker save -o prometheus.tar \ 192.168.0.122:1443/library/busybox:1.37.0 \ 192.168.0.122:1443/kubectl:v1.34.2 \ 192.168.0.122:1443/prometheus/alertmanager:v0.31.1 \ 192.168.0.122:1443/prometheus-operator/admission-webhook:v0.89.0 \ 192.168.0.122:1443/jkroepke/kube-webhook-certgen:1.7.8 \ 192.168.0.122:1443/prometheus-operator/prometheus-operator:v0.89.0 \ 192.168.0.122:1443/prometheus-operator/prometheus-config-reloader:v0.89.0 \ 192.168.0.122:1443/thanos/thanos:v0.41.0 \ 192.168.0.122:1443/prometheus/prometheus:v3.10.0 \ 192.168.0.122:1443/kube-state-metrics/kube-state-metrics:v2.18.0 \ 192.168.0.122:1443/prometheus/node-exporter:v1.10.2 \ 192.168.0.122:1443/grafana/grafana:12.4.1 \ 192.168.0.122:1443/kiwigrid/k8s-sidecar:2.5.0 \ 192.168.0.122:1443/brancz/kube-rbac-proxy:v0.14.0 \ 192.168.0.122:1443/jimmidyson/configmap-reload:v0.8.0 docker push 192.168.0.122:1443/busybox:1.37 docker push 192.168.0.122:1443/kubectl:v1.34.2 docker push 192.168.0.122:1443/prometheus/alertmanager:v0.31.1 docker push 192.168.0.122:1443/prometheus-operator/admission-webhook:v0.89.0 docker push 192.168.0.122:1443/jkroepke/kube-webhook-certgen:1.7.8 docker push 192.168.0.122:1443/prometheus-operator/prometheus-operator:v0.89.0 docker push 192.168.0.122:1443/prometheus-operator/prometheus-config-reloader:v0.89.0 docker push 192.168.0.122:1443/thanos/thanos:v0.41.0 docker push 192.168.0.122:1443/prometheus/prometheus:v3.10.0 docker push 192.168.0.122:1443/kube-state-metrics/kube-state-metrics:v2.18.0 docker push 192.168.0.122:1443/prometheus/node-exporter:v1.10.2 docker push 192.168.0.122:1443/grafana/grafana:12.4.1 docker push 192.168.0.122:1443/kiwigrid/k8s-sidecar:2.5.0 docker push 192.168.0.122:1443/brancz/kube-rbac-proxy:v0.14.0 docker push 192.168.0.122:1443/jimmidyson/configmap-reload:v0.8.0
-
编写 values 文件,在
values-offline.yaml中覆盖所有镜像的repository和tag,确保指向私有仓库。values-offline.yaml # 全局镜像仓库地址(所有未单独指定 registry 的镜像都将使用此地址) global: imageRegistry: "192.168.0.122:1443" # ---------- Prometheus Operator 相关组件 ---------- prometheusOperator: # Operator 主镜像 image: repository: prometheus-operator/prometheus-operator tag: v0.89.0 # Prometheus Config Reloader 镜像 prometheusConfigReloader: image: repository: prometheus-operator/prometheus-config-reloader tag: v0.89.0 # Admission Webhook 镜像 admissionWebhooks: image: repository: prometheus-operator/admission-webhook tag: v0.89.0 # Webhook 证书生成镜像 patch: image: repository: jkroepke/kube-webhook-certgen tag: 1.7.8 # kube-rbac-proxy 镜像(用于保护 metrics 端点) kubeRbacProxy: image: repository: brancz/kube-rbac-proxy tag: v0.14.0 # configmap-reload 镜像(用于热加载配置) configmapReload: image: repository: jimmidyson/configmap-reload tag: v0.8.0 # ⚠️ 注意:busybox 和 kubectl 镜像的配置路径是根据你之前 grep 出的结构推测的, # 实际可能位于其他位置(如 prometheusOperator.prometheusConfigReloader.initContainer 等)。 # 请先执行 helm template 确认它们的实际引用路径,必要时调整此处。 images: busybox: repository: busybox tag: "1.37" kubectl: repository: kubectl tag: v1.34.2 # ---------- Alertmanager ---------- alertmanager: alertmanagerSpec: image: repository: prometheus/alertmanager tag: v0.31.1 # 如果 alertmanager 也使用 configmap-reload,可在此覆盖 configmapReload: image: repository: jimmidyson/configmap-reload tag: v0.8.0 # ---------- Prometheus ---------- prometheus: prometheusSpec: # Prometheus 主镜像 image: repository: prometheus/prometheus tag: v3.10.0 # Thanos sidecar 镜像 thanos: image: "192.168.0.122:1443/thanos/thanos:v0.41.0" # configmap-reload 边车镜像 configmapReload: image: repository: jimmidyson/configmap-reload tag: v0.8.0 # kube-rbac-proxy 边车镜像 kubeRbacProxy: image: repository: brancz/kube-rbac-proxy tag: v0.14.0 # ---------- 新增持久化配置 ---------- retention: 30d # 数据保留30天 storageSpec: volumeClaimTemplate: spec: storageClassName: nfs-sc # 你的 StorageClass 名称 accessModes: ["ReadWriteOnce"] resources: requests: storage: 3Gi # 根据需求调整大小 # ---------- Thanos 独立组件(如果启用) ---------- thanos: image: "192.168.0.122:1443/thanos/thanos:v0.41.0" # ---------- 子 Chart:kube-state-metrics ---------- kube-state-metrics: image: repository: kube-state-metrics/kube-state-metrics tag: v2.18.0 # ---------- 子 Chart:prometheus-node-exporter ---------- prometheus-node-exporter: image: repository: prometheus/node-exporter tag: v1.10.2 # ---------- 子 Chart:grafana ---------- grafana: image: repository: grafana/grafana tag: 12.4.1 sidecar: image: repository: kiwigrid/k8s-sidecar tag: 2.5.0 service: type: NodePort nodePort: 31234 # ---------- 新增持久化配置 ---------- persistence: enabled: true storageClassName: nfs-sc accessModes: ["ReadWriteOnce"] size: 1Gi # Grafana 数据较小,10G 足够
3. 关键配置调整
- 全局镜像仓库:
global.imageRegistry设为192.168.0.122:1443,减少重复书写。
- Prometheus Operator 组件,配置
prometheusOperator下各镜像(operator、config-reloader、admission-webhook、certgen、kube-rbac-proxy 等)的私有仓库路径。
4. 部署执行
- 安装(首次):
helm install prometheus ./kube-prometheus-stack-82.10.3.tgz --namespace monitoring --create-namespace -f values-offline.yaml
- 升级(修改配置后):
helm upgrade prometheus ./kube-prometheus-stack-82.10.3.tgz --namespace monitoring -f values-offline.yaml
5. 暴露服务
-
Grafana:通过
NodePort方式,固定端口31234,访问http://<任一节点IP>:31234。 -
Prometheus UI:可通过
kubectl port-forward临时访问,或同样改为 NodePort。
6. 持久化配置(防止数据丢失)
-
Prometheus:配置
retention(如 30d)和storageSpec.volumeClaimTemplate,使用已有的 NFS StorageClass (nfs-sc)。
7. 验证与使用
-
查看 Pod 状态:
kubectl get pods -n monitoring -w -
检查 Prometheus Targets:通过端口转发访问 Prometheus UI,确认所有目标
UP。 -
登录 Grafana:默认密码从 Secret 获取,首次登录后建议修改。
http://192.168.0.61:31234/login
-
初始密码:存储在 Secret
prometheus-grafana中,通过base64 -d获取。kubectl get secret --namespace monitoring prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 -d ; echo kubectl get secret --namespace monitoring -l app.kubernetes.io/component=admin-secret -o jsonpath="{.items[0].data.admin-password}" | base64 --decode ; echo -
修改密码后:密码保存在 Grafana 数据库(
/var/lib/grafana/grafana.db)中,配置持久化即可永久保存。
将外部服务器纳入 Kubernetes Prometheus 监控
1. 在外部服务器部署 Node Exporter
-
下载并安装
node_exporter(默认端口9100)。https://github.com/prometheus/node_exporter/releases/download/v1.10.2/node_exporter-1.10.2.linux-amd64.tar.gz
tar -xvf node_exporter-1.10.2.linux-amd64.tar.gzsudo mv node_exporter-1.10.2.linux-amd64/node_exporter /usr/local/bin/ -
配置 systemd 服务并启动。
node_exporter.service # /etc/systemd/system/node_exporter.service [Unit] Description=Node Exporter After=network.target [Service] Type=simple ExecStart=/usr/local/bin/node_exporter [Install]
-------------------------------------------sudo systemctl daemon-reload sudo systemctl enable node_exporter sudo systemctl start node_exporter -
验证:
curl http://127.0.0.1:9100/metrics
-
防火墙:放行
9100端口,仅允许 Kubernetes 节点网段访问。
2. 在 Kubernetes 集群内创建 Service 和 Endpoints
-
创建无 Selector 的 Service和同名 Endpoints,指定外部服务器 IP 和端口
9100。 -
#external-node.yaml
apiVersion: v1 kind: Service metadata: name: external-node-122 # 服务名称,可以自定义 namespace: monitoring # 和Prometheus在同一个命名空间(如monitoring) labels: k8s-app: node-exporter # 标签,供ServiceMonitor选择 spec: type: ClusterIP ports: - name: metrics # 端口名称,需与ServiceMonitor中定义的一致 port: 9100 protocol: TCP targetPort: 9100 # 注意:这里没有 selector,表示这不是为Pod创建的Service --- apiVersion: v1 kind: Endpoints metadata: name: external-node-122 # 必须和Service的name完全一致 namespace: monitoring subsets: - addresses: - ip: 192.168.0.122 # 替换为真实IP ports: - name: metrics # 端口名称,与Service中保持一致 port: 9100 protocol: TCP -
确保与 Prometheus 同 Namespace(通常为
monitoring)。执行命令创建:
kubectl apply -f external-node.yaml
3. 配置 Prometheus 抓取(推荐 ServiceMonitor)
-
创建 ServiceMonitor,通过标签选择器(如
app: node-exporter)发现上述 Service。# servicemonitor-external.yaml apiVersion: monitoring.coreos.com/v1 kind: ServiceMonitor metadata: name: external-node-122 namespace: monitoring labels: release: prometheus # 必须与 Prometheus 实例的 serviceMonitorSelector 匹配 spec: jobLabel: k8s-app endpoints: - port: metrics # 对应 Service 中的 port name interval: 30s path: /metrics scheme: http selector: matchLabels: k8s-app: node-exporter # 匹配 Service 的标签 namespaceSelector: matchNames: - monitoring -
关键字段:
port: metrics(与 Service 端口名一致)。 -
确保 ServiceMonitor 的
release标签匹配 Prometheus 实例的serviceMonitorSelector(通常是release: prometheus)。执行命令创建 kubectl apply -f servicemonitor-external.yaml
4. 验证
-
查看 Prometheus Targets(
http://localhost:9090/targets),状态应为 UP。 -
查询指标
node_cpu_seconds_total确认数据。 -
Grafana 导入 Node Exporter 仪表盘(如 ID
1860),新节点将自动出现在instance下拉框。
关键要点
-
网络连通:K8s 节点必须能访问外部服务器
9100端口。 -
标签一致:Service 的 labels 必须与 ServiceMonitor 的 selector 匹配。
-
多节点扩展:为每个服务器创建独立的 Service/Endpoints,可复用同一个 ServiceMonitor(通过相同标签)。

浙公网安备 33010602011771号