Alertmanager组件

一、alertmangaer配置参数说明:

global配置说明:

global:                                  #全局配置
  resolve_timeout: 1m         #设置解析超时时间
  group_by: ['alertname']    #alertmanager中的分组,选哪个标签作为分组的依据
  group_wait: 10s         #分组等待时间,拿到第一条告警后等待10s,如果在这个组有其他的告警一起发送出去
  group_interval: 10s    #各个分组发送告警的间隔时间
  repeat_interval: 1h    #重复告警时间,默认1小时,1小时未解决,继续报警
  receiver:  'default-receiver'       #默认的接收者,如果报警没有匹配到接收器,则发到这个默认的接收器上

告警路由route和标签match_re说明

      在Alertmanager的配置中会定义一个基于标签匹配规则的告警路由树,以确定在接收到告警后Alertmanager需要如何对其进行处理,其中route中主要定义了告警的路由匹配规则,以及Alertmanager需要将匹配到的告警发送给哪一个receiver,如在Alertmanager配置文件中,我们只定义了一个路由,那就意味着所有由Prometheus产生的告警在发送到Alertmanager之后都会通过名为default-receiver的receiver接收,这里的default-receiver定义为一个邮箱,
      在实际生产环境下,对于不同级别的告警,我们可能会不完全不同的处理方式,因此在route中,我们还可以定义更多的子Route,这些Route通过标签匹配告警的处理方式

     更多链接参考:https://yunlzheng.gitbook.io

kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitor-sa
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: 'xxx@163.com'
      smtp_auth_username: 'xxx'
      smtp_auth_password: '1989317li'
      smtp_require_tls: false
    route:
      group_by: [alertname]
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: 'default-receiver'
      routes:                    #子路由
      - receiver: cluster1
        group_wait: 10s
        match_re:                 #正则匹配
            severity: critical     #critical等级的告警发送到cluster1的接收方
    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: '1980570647@qq.com'
        send_resolved: true  
    - name: 'cluster1'
      webhook_configs:
      - url: 'http://192.168.124.16:8060/dingtalk/cluster1/send'
        send_resolved: true

告警抑制inhabit(既有warnning,又有critical时候,只把critical告警信息发出来,这就是告警抑制)

inhibit_rules:
- source_match:
      severity: 'critical'
  target_match:
      severity: 'warning'
  equal: ['alertname']

 二、alertmanager配置告警说明

alertmanager配置邮件告警:

1)configmap配置

kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitor-sa
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: 'xxx@163.com'
      smtp_auth_username: 'xx'
      smtp_auth_password: 'GRJGVYPOPMMWXJNX'
      smtp_require_tls: false
    route:
      group_by: [alertname]
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: default-receiver
    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: 'wushaoyu95@163.com'
        send_resolved: true

alertmanager配置钉钉告警:

1)创建钉钉机器人

打开电脑版钉钉,创建一个群,创建自定义机器人,按如下步骤创建
https://ding-doc.dingtalk.com/doc#/serverapi2/qf2nxq

我创建的机器人如下:
群设置-->智能群助手-->添加机器人-->自定义-->添加

机器人名称:kube-event
接收群组:钉钉报警测试

安全设置:
自定义关键词:cluster1

上面配置好之后点击完成即可,这样就会创建一个kube-event的报警机器人,创建机器人成功之后怎么查看webhook,按如下:

点击智能群助手,可以看到刚才创建的kube-event这个机器人,点击kube-event,就会进入到kube-event机器人的设置界面

出现如下内容:
机器人名称:kube-event
接受群组:钉钉报警测试
消息推送:开启
webhook:https://oapi.dingtalk.com/robot/send?access_token=9c03ff1f47b1d15a10d852398cafb84f8e81ceeb1ba557eddd8a79e5a5e5548e
安全设置:
自定义关键词:cluster1

2)安装钉钉的webhook插件

tar zxvf prometheus-webhook-dingtalk-0.3.0.linux-amd64.tar.gz
cd prometheus-webhook-dingtalk-0.3.0.linux-amd64

nohup ./prometheus-webhook-dingtalk --web.listen-address="0.0.0.0:8060" --ding.profile="cluster1=https://oapi.dingtalk.com/robot/send?access_token=9c03ff1f47b1d15a10d852398cafb84f8e8eeb1ba557eddd8a79e5a5e5548e" &

3)configmap配置

kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitor-sa
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: 'xxx@163.com'
      smtp_auth_username: 'xxx'
      smtp_auth_password: '1989317li'
      smtp_require_tls: false
    route:
      group_by: [alertname]
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: cluster1
    receivers:
    - name: cluster1
      webhook_configs:
      - url: 'http://192.168.124.16:8060/dingtalk/cluster1/send'
        send_resolved: true

alertmanager配置邮件和钉钉同时告警:

kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: monitor-sa
data:
  alertmanager.yml: |-
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: 'xxx@163.com'
      smtp_auth_username: 'xxx'
      smtp_auth_password: '1989317li'
      smtp_require_tls: false
    route:
      group_by: [alertname]
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: 'default-receiver'
      routes:                    #子路由
      - receiver: cluster1
        group_wait: 10s
        match_re:                 #正则匹配
            severity: critical     #critical等级的告警发送到cluster1的接收方
    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: '1980570647@qq.com'
        send_resolved: true  
    - name: 'cluster1'
      webhook_configs:
      - url: 'http://192.168.124.16:8060/dingtalk/cluster1/send'
        send_resolved: true
    inhibit_rules:
    - source_match:
          severity: 'critical'
      target_match:
          severity: 'warning'
      equal: ['alertname']

 proemtheus书籍:https://yunlzheng.gitbook.io

posted @ 2022-03-24 11:35  wushaoyu  阅读(294)  评论(0)    收藏  举报