Kubernetes1.16.1安装prometheus-operator监控

一.安装
1.cd /app
wget https://zxytest.zhixueyun.com/installer/prometheus-operator.zip
unzip prometheus-operator.zip

2.替换10.80.154.143为新的k8s master IP
sed -i 's/10.80.154.143/x.x.x.x/g' /app/prometheus-operator/manifests/prometheus-rules.yaml

prometheus-rules.yaml里面的内容for后面的值都改成了for: 1m,为了将告警邮件发送的及时,1分钟出问题马上发邮件,类似如下内容

- alert: KubeAPIErrorsHigh
  annotations:
    message: k8s-master-电信生产环境 API server is returning errors for {{ $value }}% of requests.
    runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubeapierrorshigh
  expr: |
    sum(rate(apiserver_request_count{job="apiserver",code=~"^(?:5..)$"}[5m])) without(instance, pod)
      /
    sum(rate(apiserver_request_count{job="apiserver"}[5m])) without(instance, pod) * 100 5
  for: 1m
  labels:
    severity: warning

 

3.启动
kubectl create namespace monitoring
kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=/app/prometheus-operator/alertmanager.yaml -n monitoring
sleep 10
kubectl apply -f /app/prometheus-operator/manifests

kubectl apply -f /app/prometheus-operator/bundle.yaml

4.确认是不是启动成功

[root@iZbp1g2sdxrkqfpcvlo4qoZ ~]# kubectl get po -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   0          87m
grafana-cdcfb7675-pjtlx                1/1     Running   0          87m
kube-state-metrics-679b55565d-nfr2g    4/4     Running   0          87m
node-exporter-5k4wx                    2/2     Running   0          87m
prometheus-adapter-795bc54d5d-9ghgm    1/1     Running   0          87m
prometheus-k8s-0                       3/3     Running   1          87m
prometheus-k8s-1                       3/3     Running   1          87m
prometheus-operator-657bb7d47b-6hs4z   1/1     Running   0          113m
[root@iZbp1g2sdxrkqfpcvlo4qoZ ~]# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
alertmanager-main       NodePort    10.254.83.166    <none>        9093:30093/TCP      87m
alertmanager-operated   ClusterIP   None             <none>        9093/TCP,6783/TCP   87m
grafana                 NodePort    10.254.47.241    <none>        3000:30000/TCP      87m
kube-state-metrics      ClusterIP   None             <none>        8443/TCP,9443/TCP   87m
node-exporter           ClusterIP   None             <none>        9100/TCP            87m
prometheus-adapter      ClusterIP   10.254.124.228   <none>        443/TCP             87m
prometheus-k8s          NodePort    10.254.14.99     <none>        9090:30041/TCP      87m
prometheus-operated     ClusterIP   None             <none>        9090/TCP            87m

这里注意要让alertmanager-main分配到可以发邮件的服务器上,alertmanager-main所在的node必须能telnet  smtp.exmail.qq.com 25,否则告警邮件无法发送
5.停止并删除
kubectl delete secret alertmanager-main -n monitoring
kubectl delete -f /app/prometheus-operator/manifests
kubectl delete -f /app/prometheus-operator/bundle.yaml

二.访问alert manager,prometheus-k8s,grafana
1.alert manager
http://x.x.x.x:30093

2.prometheus-k8s
http://x.x.x.x:30041

3.grafana
http://x.x.x.x:30000

三.导入kubernetes-cluster-monitoring-via-prometheus_rev1.json监控k8s集群资源使用情况
1.登录grafana
http://x.x.x.x:30000

2.导入kubernetes-cluster-monitoring-via-prometheus_rev1.json模版

kubernetes-cluster-monitoring-via-prometheus_rev1.json

3.由于k8s 1.16去掉了container_name,pod_name变量,变成了container,pod,所以从grafana官网下载k8s dashboard,需要将pod_name替换成pod,container_name替换成container,

这也导致,这几个跟k8s相关的监控dashboard获取不到数据

posted @ 2020-06-03 00:06  $world  阅读(483)  评论(0)    收藏  举报