集群概览
[root@minikube ~]# kubectl version
Client Version: v1.28.15
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.28.15
[root@minikube ~]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)
[root@minikube ~]# uname -a
Linux minikube 3.10.0-1160.119.1.el7.x86_64 #1 SMP Tue Jun 4 14:43:51 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
[root@minikube ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
minikube Ready control-plane 13d v1.28.15
worker Ready <none> 13d v1.28.15
[root@minikube ~]# kubectl get all -n trafficguard
NAME READY STATUS RESTARTS AGE
pod/trafficguard-db-0 1/1 Running 4 (3h39m ago) 3d1h
pod/trafficguard-deployment-59d47d447f-zww6h 1/1 Running 4 (3h39m ago) 3d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/app-service NodePort 10.110.129.210 <none> 8090:30000/TCP 3d
service/db-service ClusterIP 10.102.252.92 <none> 3306/TCP 3d1h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/trafficguard-deployment 1/1 1 1 3d
NAME DESIRED CURRENT READY AGE
replicaset.apps/trafficguard-deployment-59d47d447f 1 1 1 3d
NAME READY AGE
statefulset.apps/trafficguard-db 1/1 3d1h
[root@minikube ~]# kubectl get all -n pacenotes
NAME READY STATUS RESTARTS AGE
pod/pacenotes-deployment-5dbd8bff7f-gpnjk 1/1 Running 3 (3h39m ago) 43h
pod/pacenotes-deployment-5dbd8bff7f-qhzlk 1/1 Running 3 (3h39m ago) 44h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/app-service NodePort 10.104.116.231 <none> 6000:30005/TCP 2d2h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/pacenotes-deployment 2/2 2 2 2d2h
NAME DESIRED CURRENT READY AGE
replicaset.apps/pacenotes-deployment-5dbd8bff7f 2 2 2 44h
安装 Helm
我们使用 Helm 这个 Kubernetes 包管理器来简化部署过程,这是管理云原生应用的推荐方式。
略
安装 Metrics Server
Metrics Server 是 Kubernetes 内置的轻量级组件,用于收集节点和 Pod 的 CPU/内存指标。如果未安装,它是 Prometheus 的补充。
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
编辑 ConfigMap 以支持不安全 TLS(适用于自建集群)
kubectl edit deployment metrics-server -n kube-system # 添加 --kubelet-insecure-tls 到 args 部分
验证:
[root@minikube ~]# kubectl top pods -n pacenotes
NAME CPU(cores) MEMORY(bytes)
pacenotes-deployment-66f9b8c8d6-ccnmw 0m 3Mi
pacenotes-deployment-66f9b8c8d6-x8q64 0m 4Mi
[root@minikube ~]# kubectl top pods -n trafficguard
NAME CPU(cores) MEMORY(bytes)
trafficguard-db-0 18m 237Mi
trafficguard-deployment-bb68d856-v46sl 0m 12Mi
[root@minikube ~]# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
minikube 524m 13% 1608Mi 58%
worker 370m 9% 1876Mi 68%
部署 Prometheus Operator
前提配置(针对 kubeadm)
kubeadm 默认将 kube-controller-manager 和 kube-scheduler 绑定到 127.0.0.1,导致 Prometheus 无法监控它们。编辑 manifests:
sudo sed -i 's/--bind-address=127.0.0.1/--bind-address=0.0.0.0/' /etc/kubernetes/manifests/kube-controller-manager.yaml
sudo sed -i 's/--bind-address=127.0.0.1/--bind-address=0.0.0.0/' /etc/kubernetes/manifests/kube-scheduler.yaml
# 重启 kubelet 以重建静态 Pod
sudo systemctl restart kubele
Prometheus Operator 会一键部署 Prometheus、Alertmanager、Node Exporter 和 kube-state-metrics,并且提供了强大的管理能力。
# 添加 Prometheus 官方 Chart 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 创建一个新的命名空间用于监控组件
kubectl create namespace monitoring
# 使用 Helm 部署 Prometheus Operator
# --set 后面可以跟各种配置,这里我们使用默认配置即可
helm install prometheus prometheus-community/kube-prometheus-stack -n monitoring
执行上述命令后,Prometheus、Alertmanager 和 Grafana 都会被部署到 monitoring 命名空间中。检查 Pod 的状态:
[root@minikube ~]# kubectl get pod -n monitoring
NAME READY STATUS RESTARTS AGE
alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 5m2s
prometheus-grafana-749f6586cf-n6brx 3/3 Running 0 5m49s
prometheus-kube-prometheus-operator-55cc755d66-zmb4p 1/1 Running 0 5m49s
prometheus-kube-state-metrics-7779f5768f-4k9cn 1/1 Running 0 5m49s
prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 5m2s
prometheus-prometheus-node-exporter-pj6xc 1/1 Running 0 5m49s
prometheus-prometheus-node-exporter-wrkcl 1/1 Running 0 5m49s
访问 Grafana 本地实例
获取访问端口
[root@minikube ~]# kubectl get service prometheus-grafana -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-grafana ClusterIP 10.102.109.241 <none> 80/TCP 14m
# Grafana 默认部署为 `ClusterIP` 服务,推荐通过端口转发来访问它
export POD_NAME=$(kubectl --namespace monitoring get pod -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=prometheus" -oname)
kubectl --namespace monitoring port-forward $POD_NAME 3000
如果需要长期从外部访问 Grafana,可以修改 Service 类型。
[root@minikube ~]# kubectl edit service prometheus-grafana -n monitoring # 将 type: ClusterIP 修改为 type: NodePort
service/prometheus-grafana edited
[root@minikube ~]# kubectl get service prometheus-grafana -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-grafana NodePort 10.102.109.241 <none> 80:31738/TCP 22m
获取 Grafana 的默认登录密码
Get Grafana 'admin' user password by running:
kubectl --namespace monitoring get secrets prometheus-grafana -o jsonpath="{.data.admin-password}" | base64 -d ; echo
访问 http://192.168.31.215:31738/dashboards,应该能看到集群和节点的监控信息
监控应用
Prometheus 使用基于拉取的模型从应用程序和服务收集指标。这意味着应用程序和服务必须公开一个包含 Prometheus 格式指标的 HTTP(S) 端点。然后,Prometheus 将根据其配置定期从这些 HTTP(S) 端点抓取指标。
Prometheus Operator 包含一个自定义资源定义,允许定义 ServiceMonitor。ServiceMonitor 用于定义从 Kubernetes 中抓取指标的应用程序,控制器将执行我们定义的 ServiceMonitor,并自动构建所需的 Prometheus 配置。
查询 serviceMonitorSelector.matchLabels
[root@minikube ~]# kubectl get prometheus -n monitoring -o yaml
apiVersion: v1
items:
- apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
app: kube-prometheus-stack-prometheus
app.kubernetes.io/instance: prometheus
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/part-of: kube-prometheus-stack
app.kubernetes.io/version: 77.5.0
chart: kube-prometheus-stack-77.5.0
heritage: Helm
release: prometheus
name: prometheus-kube-prometheus-prometheus
namespace: monitoring
spec:
serviceMonitorSelector:
matchLabels:
release: prometheus
编写 serviceMonitor
略
在 Grafana pod 中访问数据源
[root@minikube ~]# kubectl exec -n monitoring prometheus-grafana-749f6586cf-n6brx -c grafana -- curl -s \
> 'http://prometheus-kube-prometheus-prometheus.monitoring:9090/api/v1/query?query=grpc_server_handled_total'
{"status":"success","data":{"resultType":"vector","result":["value":[1757442694.585,"2"]}]}}
[root@minikube ~]# kubectl exec -n monitoring prometheus-grafana-749f6586cf-n6brx -c grafana -- curl -s \
> 'http://prometheus-kube-prometheus-prometheus.monitoring:9090/api/v1/query?query=gin_request_size_bytes_sum'
{"status":"success","data":{"resultType":"vector","result":["value":[1757442734.462,"2455"]}]}}
在 Grafana UI 中访问数据源
失败,没找到原因
kubectl logs -f -n monitoring -l app.kubernetes.io/name=grafana
部署 Loki 日志后端
部署 Loki 最简单的方式是使用 Helm Chart。
添加 Grafana Helm 仓库
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
创建 Loki Namespace
kubectl create namespace loki
安装 Loki Helm Chart
helm install loki grafana/loki-stack -n loki --set grafana.enabled=false
这里我们设置 grafana.enabled=false,因为已经有一个 Grafana 实例,无需再安装一个。
安装成功后,Loki 将作为 Deployment 运行,并有一个 Service 暴露出来
验证
kubectl get all -n loki
应该会看到 loki 和 promtail 相关的 Pod、Service 等资源。promtail 是 Loki 的日志收集代理,它会自动部署为 DaemonSet。
在 Grafana 中添加 Loki 数据源
失败,没找到原因
浙公网安备 33010602011771号