Prometheus operator

一、简介

地址:https://github.com/prometheus-operator/kube-prometheus

https://blog.csdn.net/choerodon/article/details/98587027

Prometheus Operator架构图:

   

  • Operator:根据自定义资源(Custom Resource Definition / CRDs)来部署和管理Prometheus Server,同时监控这些自定义资源事件的变化来做相应的处理,是整个系统的控制中心
  • Prometheus:声明Prometheus deployment期望的状态,Operator确保这个deployment运行时一直与定义保持一致
  • Prometheus Server:Opreator根据自定义资源Prometheus类型中定义内容而部署的Prometheus Server集群,这些自定义资源可以看作是用来管理Prometheus Server集群的StatefulSets资源
  • ServiceMonitor:声明指定监控的服务,描述了一组被Prometheus监控的目标列表。该资源通过Labels来获取对应的Service Endpoint,让Prometheus Server通过选取的Service 来获取 Metrics信息
  • Service:简单的说就是Prometheus监控的对象
  • Alertmanager:定义AlertManager deployment期望的状态,Operator确保这个deployment运行时一直与定义保持一致

 

二、部署

Prometheus Operator部署很简单

# 下载
# git clone https://github.com/prometheus-operator/kube-prometheus.git
​
# cd kube-prometheus
​
# 安装operator
# kubectl create -f manifests/setup
​
# 安装prometheus
kubectl create -f manifests/
  • 可以在replicas定义启动个数

查看

# kubectl get pods -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
alertmanager-main-0                    2/2     Running   10         8d
blackbox-86b7486879-w6n22              1/1     Running   0          18h
grafana-5cb8d5c55b-wplg4               1/1     Running   5          8d
kafka-exporter-5cf8fdd8f8-c4j5t        1/1     Running   0          20h
kube-state-metrics-65f69f9759-spcr6    3/3     Running   27         8d
node-exporter-rdjl9                    2/2     Running   2          24h
prometheus-adapter-865cc8dbcd-bc7v6    1/1     Running   34         8d
prometheus-k8s-0                       2/2     Running   3          76m
prometheus-operator-56d44459f7-vt2l9   2/2     Running   15         8d
# kubectl get svc -n monitoring
NAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGE
alertmanager-main       ClusterIP   10.99.189.210   <none>        9093/TCP                     8d
alertmanager-operated   ClusterIP   None            <none>        9093/TCP,9094/TCP,9094/UDP   8d
blackbox                ClusterIP   10.108.47.141   <none>        9115/TCP                     18h
grafana                 ClusterIP   10.104.30.183   <none>        3000/TCP                     8d
kafka-exporter          ClusterIP   10.98.228.115   <none>        9308/TCP                     20h
kube-state-metrics      ClusterIP   None            <none>        8443/TCP,9443/TCP            8d
node-exporter           ClusterIP   None            <none>        9100/TCP                     8d
prometheus-adapter      ClusterIP   10.108.67.0     <none>        443/TCP                      8d
prometheus-k8s          ClusterIP   10.96.50.138    <none>        9090/TCP                     8d
prometheus-operated     ClusterIP   None            <none>        9090/TCP                     16h
prometheus-operator     ClusterIP   None            <none>        8443/TCP                     8d

 

定义ingress,用于访问alertmanager、grafana、prometheus

prom-monitor.yaml

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: prom-monitor
  namespace: monitoring
spec:
  rules:
  - host: alert.test.com
    http:
      paths:
      - backend:
          serviceName: alertmanager-main
          servicePort: 9093
        path: /
  - host: grafana.test.com
    http:
      paths:
      - backend:
          serviceName: grafana
          servicePort: 3000
        path: /
  - host: prom.test.com
    http:
      paths:
      - backend:
          serviceName: prometheus-k8s
          servicePort: 9090
        path: /
  • grafana.test.com prom.test.com alert.test.com

修改本机hosts文件

访问 grafana.test.com,其本身提供了很多dashboard

  

 

三、处理无法监控controller-manager

  二进制安装的k8s,在使用operator安装的Prometheus,默认是监控不到controller-manager和scheduler,需要另行配置这两个组件。原因在于servicemonitor是通过匹配service中的label来添加监控的,但是二进制安装的k8s中,kube-system这个namespace中没有controller-manager和scheduler的service。

  查看 

# 查看servicemonitor
# kubectl get servicemonitor -n monitoring
NAME                      AGE
alertmanager              7d2h
coredns                   7d2h
grafana                   7d2h
kube-apiserver            7d2h
kube-controller-manager   7d2h
kube-scheduler            7d2h
kube-state-metrics        7d2h
kubelet                   7d2h
node-exporter             7d2h
prometheus                7d2h
prometheus-adapter        7d2h
prometheus-operator       7d2h

  查看kube-controller-manager的servicemonitor

# kubectl get servicemonitor kube-controller-manager -n monitoring -o yaml | tail -15
...
    port: http-metrics
    scheme: http
    tlsConfig:
      insecureSkipVerify: false
  jobLabel: k8s-app
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      k8s-app: kube-controller-manager
  • 其需要在kube-system下匹配一个含有k8s-app=kube-controller-manager的service
  • 修改其scheme为http,默认为https

  kube-controller-manager这个标签的serviceendpoints在kube-system这个namespace是没有的,所有Prometheus无法获取controller-manager的信息,所以需要创建controller-manager的service和endpoint

  controller-endpoint.yaml

apiVersion: v1
kind: Endpoints
metadata:
  name: kube-controller-manager-monitoring
  namespace: kube-system
  labels:
    k8s-app: kube-controller-manager
subsets:
  - addresses:
    - ip: 192.168.10.240
    - ip: 192.168.10.241
    - ip: 192.168.10.242
    ports:
    - name: http-metrics
      port: 10252
      protocol: TCP

  controller-service.yaml

apiVersion: v1
kind: Service
metadata:
  name: kube-controller-manager-monitoring
  namespace: kube-system
  labels:
    k8s-app: kube-controller-manager
spec:
  ports:
  - port: 10252
    name: http-metrics
    protocol: TCP
  type: ClusterIP

创建

# kubectl create -f .

查看

# kubectl get svc,ep -n kube-system -l k8s-app=kube-controller-manager
NAME                                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)     AGE
service/kube-controller-manager-monitoring   ClusterIP   10.102.204.13   <none>        10252/TCP   44m

NAME                                           ENDPOINTS                                                        AGE
endpoints/kube-controller-manager-monitoring   192.168.10.240:10252,192.168.10.241:10252,192.168.10.242:10252   44m


同时修改controller-manager的启动配置文件

/usr/lib/systemd/system/kube-controller-manager.service

# 修改地址
--address=0.0.0.0 

重启controller-manager

 

测试

# curl 127.0.0.1:10252
404 page not found

# curl 10.102.204.13:10252
404 page not found

访问本机端口和controller-manager的service端口的结果是一样的

 

查看prometheus

  

同理修改scheduler的相关配置,就能监控scheduler的信息

posted @ 2020-11-20 11:16  Bigberg  阅读(722)  评论(0编辑  收藏  举报