Kubernetes1.16.1安装prometheus-operator监控
一.安装
1.cd /app
wget https://zxytest.zhixueyun.com/installer/prometheus-operator.zip
unzip prometheus-operator.zip
2.替换10.80.154.143为新的k8s master IP
sed -i 's/10.80.154.143/x.x.x.x/g' /app/prometheus-operator/manifests/prometheus-rules.yaml
prometheus-rules.yaml里面的内容for后面的值都改成了for: 1m,为了将告警邮件发送的及时,1分钟出问题马上发邮件,类似如下内容
- alert: KubeAPIErrorsHigh annotations: message: k8s-master-电信生产环境 API server is returning errors for {{ $value }}% of requests. runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh expr: | sum(rate(apiserver_request_count{job="apiserver",code=~"^(?:5..)$"}[5m])) without(instance, pod) / sum(rate(apiserver_request_count{job="apiserver"}[5m])) without(instance, pod) * 100 > 5 for: 1m labels: severity: warning |
3.启动
kubectl create namespace monitoring
kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=/app/prometheus-operator/alertmanager.yaml -n monitoring
sleep 10
kubectl apply -f /app/prometheus-operator/manifests
kubectl apply -f /app/prometheus-operator/bundle.yaml
4.确认是不是启动成功
[root@iZbp1g2sdxrkqfpcvlo4qoZ ~]# kubectl get po -n monitoringNAME READY STATUS RESTARTS AGEalertmanager-main-0 2/2 Running 0 87mgrafana-cdcfb7675-pjtlx 1/1 Running 0 87mkube-state-metrics-679b55565d-nfr2g 4/4 Running 0 87mnode-exporter-5k4wx 2/2 Running 0 87mprometheus-adapter-795bc54d5d-9ghgm 1/1 Running 0 87mprometheus-k8s-0 3/3 Running 1 87mprometheus-k8s-1 3/3 Running 1 87mprometheus-operator-657bb7d47b-6hs4z 1/1 Running 0 113m[root@iZbp1g2sdxrkqfpcvlo4qoZ ~]# kubectl get svc -n monitoringNAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGEalertmanager-main NodePort 10.254.83.166 <none> 9093:30093/TCP 87malertmanager-operated ClusterIP None <none> 9093/TCP,6783/TCP 87mgrafana NodePort 10.254.47.241 <none> 3000:30000/TCP 87mkube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 87mnode-exporter ClusterIP None <none> 9100/TCP 87mprometheus-adapter ClusterIP 10.254.124.228 <none> 443/TCP 87mprometheus-k8s NodePort 10.254.14.99 <none> 9090:30041/TCP 87mprometheus-operated ClusterIP None <none> 9090/TCP 87m |
这里注意要让alertmanager-main分配到可以发邮件的服务器上,alertmanager-main所在的node必须能telnet smtp.exmail.qq.com 25,否则告警邮件无法发送
5.停止并删除
kubectl delete secret alertmanager-main -n monitoring
kubectl delete -f /app/prometheus-operator/manifests
kubectl delete -f /app/prometheus-operator/bundle.yaml
二.访问alert manager,prometheus-k8s,grafana
1.alert manager
http://x.x.x.x:30093
2.prometheus-k8s
http://x.x.x.x:30041
3.grafana
http://x.x.x.x:30000
三.导入kubernetes-cluster-monitoring-via-prometheus_rev1.json监控k8s集群资源使用情况
1.登录grafana
http://x.x.x.x:30000
2.导入kubernetes-cluster-monitoring-via-prometheus_rev1.json模版
3.由于k8s 1.16去掉了container_name,pod_name变量,变成了container,pod,所以从grafana官网下载k8s dashboard,需要将pod_name替换成pod,container_name替换成container,
这也导致,这几个跟k8s相关的监控dashboard获取不到数据





浙公网安备 33010602011771号