Kubernetes学习目录
1、运行状态查询
1.1、Pod运行状态
]# kubectl -n monitoring get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
alertmanager-main-0 2/2 Running 0 96m 10.244.3.7 node1 <none> <none>
alertmanager-main-1 2/2 Running 0 96m 10.244.4.8 node2 <none> <none>
alertmanager-main-2 2/2 Running 0 96m 10.244.4.9 node2 <none> <none>
blackbox-exporter-84bb6f6bd9-2tr2q 3/3 Running 0 95m 10.244.3.9 node1 <none> <none>
grafana-7bdbdbcb4b-67qsj 1/1 Running 0 74m 10.244.3.13 node1 <none> <none>
kube-state-metrics-c7c57885f-scxdh 3/3 Running 0 94m 10.244.3.10 node1 <none> <none>
node-exporter-27bgj 2/2 Running 0 93m 192.168.10.27 master2 <none> <none>
node-exporter-cnzhw 2/2 Running 0 93m 192.168.10.30 node2 <none> <none>
node-exporter-knqgv 2/2 Running 0 93m 192.168.10.29 node1 <none> <none>
node-exporter-qwbb6 2/2 Running 0 93m 192.168.10.26 master1 <none> <none>
prometheus-adapter-67d7695cb7-7wf9j 1/1 Running 0 95m 10.244.4.10 node2 <none> <none>
prometheus-adapter-67d7695cb7-vbdkr 1/1 Running 0 95m 10.244.3.8 node1 <none> <none>
prometheus-k8s-0 2/2 Running 0 93m 10.244.3.12 node1 <none> <none>
prometheus-k8s-1 2/2 Running 0 93m 10.244.4.11 node2 <none> <none>
prometheus-operator-ffcc9958-2dbgn 2/2 Running 0 94m 10.244.3.11 node1 <none> <none>
1.2、SVC运行状态
1.2.1、svc运行状态
]# kubectl -n monitoring get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
alertmanager-main NodePort 10.100.113.107 <none> 9093:30093/TCP,8080:30081/TCP 97m
alertmanager-operated ClusterIP None <none> 9093/TCP,9094/TCP,9094/UDP 97m
blackbox-exporter ClusterIP 10.105.55.97 <none> 9115/TCP,19115/TCP 96m
grafana NodePort 10.102.101.236 <none> 3000:30030/TCP 106m
kube-state-metrics ClusterIP None <none> 8443/TCP,9443/TCP 95m
node-exporter ClusterIP None <none> 9100/TCP 94m
prometheus-adapter ClusterIP 10.110.224.24 <none> 443/TCP 96m
prometheus-k8s NodePort 10.104.132.49 <none> 9090:30090/TCP,8080:30080/TCP 93m
prometheus-operated ClusterIP None <none> 9090/TCP 93m
prometheus-operator ClusterIP None <none> 8443/TCP 95m
注意:ClusterIP=None,表示是headless服务【即无头服务】
1.2.2、svc端口开放情况分析
alertmanager-main
type: NodePort
9093:30093/TCP,8080:30081/TCP
alertmanager web端口:
pod端口:9093
svc端口:30093
alertmanager metrics端口:
pod端口:8080
svc端口:30081
--------------------------------
grafana
type: NodePort
3000:30030/TCP
pod端口:3000
svc端口:30030
--------------------------------
prometheus-k8s
type: NodePort
9090:30090/TCP,8080:30080/TCP
prometheus web端口:
pod端口:9090
svc端口:30090
prometheus metrics端口:
pod端口:8080
svc端口:30080
1.3、EndPoints查询
# 主要查询svc与endpoint关联关系
]# kubectl -n monitoring get endpoints
NAME ENDPOINTS AGE
alertmanager-main 10.244.3.7:8080,10.244.4.8:8080,10.244.4.9:8080 + 3 more... 112m
alertmanager-operated 10.244.3.7:9094,10.244.4.8:9094,10.244.4.9:9094 + 6 more... 112m
blackbox-exporter 10.244.3.9:9115,10.244.3.9:19115 110m
grafana 10.244.3.13:3000 120m
kube-state-metrics 10.244.3.10:8443,10.244.3.10:9443 110m
node-exporter 192.168.10.26:9100,192.168.10.27:9100,192.168.10.29:9100 + 1 more... 109m
prometheus-adapter 10.244.3.8:6443,10.244.4.10:6443 111m
prometheus-k8s 10.244.3.12:8080,10.244.4.11:8080,10.244.3.12:9090 + 1 more... 108m
prometheus-operated 10.244.3.12:9090,10.244.4.11:9090 108m
prometheus-operator 10.244.3.11:8443 110m
1.4、查询prometheus资源状态
]# kubectl -n monitoring get prometheus
NAME VERSION DESIRED READY RECONCILED AVAILABLE AGE
k8s 2.41.0 2 2 True True 109m
2、Prometheus Web端查询
2.1、查询targets页面

2.2、Graph页面查询
2.2.1、进行数据的采集查询
例如查询K8S集群中每个POD的CPU使用情况,可以使用如下查询条件查询:
提示:metrics的指标名称 container_cpu_usage_seconds_total
sum(rate(container_cpu_usage_seconds_total{image!="", pod!=""}[1m] )) by (pod)

2.3、规则页面查询
这里为我们自动增加很多规则,后面可以进一步的学习,包括不知道规则怎么写的,可以参考一下。

2.4、如果修改Prometheus配置文件
2.4.1、解压出配置文件
# 解压prometheus.yaml
]# kubectl -n monitoring get secrets prometheus-k8s -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gzip -d
# 这里以alertmanager为例
]# kubectl -n monitoring get secrets alertmanager-main -o jsonpath='{.data.alertmanager\.yaml}' | base64 -d
"global":
"resolve_timeout": "5m"
"inhibit_rules":
- "equal":
- "namespace"
- "alertname"
"source_matchers":
- "severity = critical"
"target_matchers":
- "severity =~ warning|info"
- "equal":
- "namespace"
- "alertname"
"source_matchers":
- "severity = warning"
"target_matchers":
- "severity = info"
- "equal":
- "namespace"
"source_matchers":
- "alertname = InfoInhibitor"
"target_matchers":
- "severity = info"
"receivers":
- "name": "Default"
- "name": "Watchdog"
- "name": "Critical"
- "name": "null"
"route":
"group_by":
- "namespace"
"group_interval": "5m"
"group_wait": "30s"
"receiver": "Default"
"repeat_interval": "12h"
"routes":
- "matchers":
- "alertname = Watchdog"
"receiver": "Watchdog"
- "matchers":
- "alertname = InfoInhibitor"
"receiver": "null"
- "matchers":
- "severity = critical"
"receiver": "Critical"
2.4.2、输出为文件
]# kubectl -n monitoring get secrets prometheus-k8s -o jsonpath='{.data.prometheus\.yaml\.gz}' | base64 -d | gzip -d >prometheus.yaml.v1
2.4.3、修改好后,再base64,再gzip即可
gzip -c prometheus.yaml.v1 | base64
2.4.4、再通过edit修改
3、Grafana Web端查询
3.1、登陆的帐号与密码
3.2、默认已经配置数据源
3.2.1、查询数据源配置

3.2.2、为什么自动配置数据源
]# vi kube-prometheus-0.12.0/manifests/prom_adapter/prometheusAdapter-deployment.yaml
spec:
automountServiceAccountToken: true
containers:
- args:
- --cert-dir=/var/run/serving-cert
- --config=/etc/adapter/config.yaml
- --logtostderr=true
- --metrics-relist-interval=1m
- --prometheus-url=http://prometheus-k8s.monitoring.svc:9090/ # 此处已经配置
- --secure-port=6443
3.2.3、增加画图模板
这里不再重复介绍,请看文章:https://www.cnblogs.com/ygbh/p/17299339.html#_label3_2_1_2

4、AlertManager Web端查询

5、prometheus、grafana、alertmanager定制Ingress-nginx
5.1、创建ingress-nginx资源
5.1.1、定义资源配置清单
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: ingress-prometheus
namespace: monitoring
annotations:
kubernetes.io/ingress.class: "nginx"
prometheus.io/http_probe: "true"
spec:
rules:
- host: alert.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: alertmanager-main
port:
number: 9093
- host: grafana.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: grafana
port:
number: 3000
- host: prom.localprom.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: prometheus-k8s
port:
number: 9090
EOF
5.1.2、查询ingress资源状态
]# kubectl get ingress -n monitoring
NAME CLASS HOSTS ADDRESS PORTS AGE
ingress-prometheus <none> alert.localprom.com,grafana.localprom.com,prom.localprom.com 80 13s
5.2、配置hosts
192.168.10.222 prom.localprom.com
192.168.10.222 grafana.localprom.com
192.168.10.222 alert.localprom.com
5.3、访问测试
5.3.1、prometheus

5.3.2、grafana

5.3.3、alertmanager

6、Prometheus增加 controller、scheduler组件监控
6.1、需求
默认情况下,prometheus没有监控到 controller 和 scheduler的信息
6.1.1、targets截图

6.1.2、配置的流程
prometheus 要监控k8s的组件,我们需要关注以下两点:
1、kubernetes必须开放controller和scheduler的监听地址要定制专用的endpoint和svc资源对象
2、prometheus是根据 kubernetes-serviceMonitorKubeScheduler.yaml和kubernetesserviceMonitorKubeControllerManager.yaml 文件来进行监控的。所以,定制的svc关联出来的labels 必须要与kube-prometheus的值一致
6.2、开放监听地址【所有的master节点都要修改】
修改好,会自动加载,不用重启服务
]# kubectl -n kube-system get pods -o wide| grep -E 'schedu|control'
calico-kube-controllers-74846594dd-76m7g 1/1 Running 0 9d 10.244.1.2 master2 <none> <none>
kube-controller-manager-master1 1/1 Running 0 48s 192.168.10.26 master1 <none> <none>
kube-controller-manager-master2 1/1 Running 0 2m2s 192.168.10.27 master2 <none> <none>
kube-scheduler-master1 1/1 Running 0 49s 192.168.10.26 master1 <none> <none>
kube-scheduler-master2 1/1 Running 0 2m42s 192.168.10.27 master2 <none> <none>
6.2.1、controller-manager修改
]# vi /etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
containers:
- command:
- kube-controller-manager
- --allocate-node-cidrs=true
- --authentication-kubeconfig=/etc/kubernetes/controller-manager.conf
- --authorization-kubeconfig=/etc/kubernetes/controller-manager.conf
- --bind-address=0.0.0.0
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --cluster-cidr=10.244.0.0/16
6.2.2、scheduler修改
]# vi /etc/kubernetes/manifests/kube-scheduler.yaml
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=0.0.0.0
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
6.3、定制采集controller数据
6.3.1、创建资源配置清单
kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
spec:
type: ClusterIP
clusterIP: None
ports:
- name: https-metrics
port: 10257
targetPort: 10257
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-controller-manager
namespace: kube-system
labels:
app.kubernetes.io/name: kube-controller-manager
subsets:
- addresses:
- ip: 192.168.10.26
- ip: 192.168.10.27
ports:
- name: https-metrics
port: 10257
protocol: TCP
EOF
# 属性解析:这里面的addresses 是master的节点地址,把所有master节点都配置上
6.3.2、查询运行状态
]# kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-controller-manager ClusterIP None <none> 10257/TCP 114s
]# kubectl -n kube-system get endpoints
NAME ENDPOINTS AGE
kube-controller-manager 192.168.10.26:10257,192.168.10.27:10257 2m1s
6.4、定制采集Scheduler数据
6.4.1、创建资源配置清单
kubectl apply -f - <<EOF
apiVersion: v1
kind: Service
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
spec:
type: ClusterIP
clusterIP: None
ports:
- name: https-metrics
port: 10259
targetPort: 10259
protocol: TCP
---
apiVersion: v1
kind: Endpoints
metadata:
name: kube-scheduler
namespace: kube-system
labels:
app.kubernetes.io/name: kube-scheduler
subsets:
- addresses:
- ip: 192.168.10.26
- ip: 192.168.10.27
ports:
- name: https-metrics
port: 10259
protocol: TCP
EOF
# 属性解析:这里面的addresses 是master的节点地址,把所有master节点都配置上
6.3.3、查询运行状态
]# kubectl -n kube-system get endpoints
NAME ENDPOINTS AGE
kube-scheduler 192.168.10.26:10259,192.168.10.27:10259 53s
]# kubectl -n kube-system get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-scheduler ClusterIP None <none> 10259/TCP 57s
6.4、查询prometheus targets是否有增加
