【解决了一个小问题】alert manager要怎么样才能触发告警到企业微信上?

作者:张富春(ahfuzhang),转载时请注明作者和引用链接,谢谢!


07-15:花了几个小时仍然是没走通这个流程,把中间结果记录一下。
07-16:嗨森,调通了!

1.部署了alert manager

部署的架构主要参考这篇文章:Alertmanager高可用
架构图如下:

为了方便prometheus访问alert manager,在alert manager前面还部署了CLB,可以通过VIP访问alert manager集群。
配置的细节如下:

  • --cluster.advertise-address="${POD_IP}:9094": 这个配置是告诉alertmanager它自己的地址是多少,便于gossip协议广播时排除自己。
  • --cluster.peer这个参数填写群集中的其他alertmanager的地址。
    • 一般第一个启动的节点不填这个参数,第二个启动的节点填第一个节点的ip:port,其他节点陆续链上去就行了。
    • 多个容器启动的时候是同时进行的,为了保障其中有一个节点一定是其他所有节点之前启动起来,我用了etctctl lock来保障启动顺序:如何用etcdctl产生分布式环境中的递增ID
    • 最终,gossip协议让所有节点都知道其他的所有节点的存在。
  • alertmanager.yaml配置文件中:
    • 配置webhook来进行告警的投递
    • webhook的链接可以点开任意微信群,然后加个机器人,就得到了链接。
  • 也可以使用amtool命令行工具来查询这些信息:
# 安装amtool
go install github.com/prometheus/alertmanager/cmd/amtool@latest

# 查看服务器端配置
amtool --alertmanager.url=http://9.220.xxx.xxx/ config show

# 查看群集状态
amtool --alertmanager.url=http://9.220.xxx.xxx/ cluster show

2.模拟告警

可以用命令行工具模拟告警:

curl -X POST "http://9.220.xxx.xxx/api/v2/alerts" -H "Content-type: application/json" -d '[{"status": "firing","labels": {"alertname": "name","service": "my-service","severity":"warning","instance": "name.example.net"},"annotations": {"summary": "High latency is high!"},"generatorURL": "http://prometheus.int.example.net/<generating_expression>"}]' -v

打开alert manager的页面,可以看见有这条告警。

3.alertmanager发给webhook的数据是什么格式?

这篇帖子里面有例子:prometheus+alertmanager+webhook实现自定义监控报警系统

json格式如下:

{
    "receiver":"webhook",
    "status":"resolved",
    "alerts":[
        {
            "status":"resolved",
            "labels":{
                "alertname":"hostCpuUsageAlert",
                "instance":"192.168.199.24:9100",
                "severity":"page"
            },
            "annotations":{
                "description":"192.168.199.24:9100 CPU 使用率超过 85% (当前值为: 0.9973333333333395)",
                "summary":"机器 192.168.199.24:9100 CPU 使用率过高"
            },
            "startsAt":"2020-02-29T19:45:21.799548092+08:00",
            "endsAt":"2020-02-29T19:49:21.799548092+08:00",
            "generatorURL":"http://localhost.localdomain:9090/graph?g0.expr=sum+by%28instance%29+%28avg+without%28cpu%29+%28irate%28node_cpu_seconds_total%7Bmode%21%3D%22idle%22%7D%5B5m%5D%29%29%29+%3E+0.85&g0.tab=1",
            "fingerprint":"368e9616d542ab48"
        }
    ],
    "groupLabels":{
        "alertname":"hostCpuUsageAlert"
    },
    "commonLabels":{
        "alertname":"hostCpuUsageAlert",
        "instance":"192.168.199.24:9100",
        "severity":"page"
    },
    "commonAnnotations":{
        "description":"192.168.199.24:9100 CPU 使用率超过 85% (当前值为: 0.9973333333333395)",
        "summary":"机器 192.168.199.24:9100 CPU 使用率过高"
    },
    "externalURL":"http://localhost.localdomain:9093",
    "version":"4"
}
//版权声明:本文为CSDN博主「昵称2021」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
//原文链接:https://blog.csdn.net/bluuusea/article/details/104619235

然而,企业微信的webhook收的可不是这个格式:消息类型及数据格式

{
    "chatid": "wrkSFfCgAAtMQKg4xqDatM5C9IDHFpTw|@all_subscriber|zhangsan",
	"post_id" : "bpkSFfCgAAWeiHNo2p6lJbG3_F2xxxxx",
	"visible_to_user": "zhangsan|lisi",
    "msgtype": "text",
    "text": {
        "content": "<@zhangsan>广州今日天气:29度,大部分多云,降雨概率:60%",
		"mentioned_list":["wangqing","@all"],
		"mentioned_mobile_list":["13800001111","@all"]
    }
}

很明显,直接把webhook地址配置到alertmanager肯定是发不出去的

4.注册企业微信的开发者

alertmanager里面有很多wechat的配置:

5.使用开源组件来把alert manager webhook 转换为 wechat webhook

还真有,地址在:https://github.com/k8stech/alertmanager-wechatrobot-webhook
我fork了作者的这个项目,并使其可以在容器环境跑起来,代码在:https://github.com/ahfuzhang/alertmanager-wechatrobot-webhook
启动命令行为:

./wechat-webhook -RobotKey=xxxxx -addr=:8999

部署后用命令行来测试一下是否能发到企业微信:

curl -X POST "http://9.220.xxx.xxx/webhook" -H "Content-type: application/json" -d '{"receiver":"webhook","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"hostCpuUsageAlert","instance":"192.168.199.24:9100","severity":"page"},"annotations":{"description":"192.168.199.24:9100 CPU 使用率超过 85% (当前值为: 0.9973333333333395)","summary":"机器 192.168.199.24:9100 CPU 使用率过高"},"startsAt":"2020-02-29T19:45:21.799548092+08:00","endsAt":"2020-02-29T19:49:21.799548092+08:00","generatorURL":"http://localhost.localdomain:9090/graph?g0.expr=sum+by%28instance%29+%28avg+without%28cpu%29+%28irate%28node_cpu_seconds_total%7Bmode%21%3D%22idle%22%7D%5B5m%5D%29%29%29+%3E+0.85&g0.tab=1","fingerprint":"368e9616d542ab48"}],"groupLabels":{"alertname":"hostCpuUsageAlert"},"commonLabels":{"alertname":"hostCpuUsageAlert","instance":"192.168.199.24:9100","severity":"page"},"commonAnnotations":{"description":"192.168.199.24:9100 CPU 使用率超过 85% (当前值为: 0.9973333333333395)","summary":"机器 192.168.199.24:9100 CPU 使用率过高"},"externalURL":"http://localhost.localdomain:9093","version":"4"}'

在企业微信群里可以看见如上的信息。

然后,修改alertmanager的yaml文件:

global:
  resolve_timeout: 10m
receivers:
- name: 'webhook'
  webhook_configs:
  - url: 'http://9.220.xxx.xxx/webhook'
route:
  group_by: ['alertname']
  group_wait: 30s
  group_interval: 1m
  repeat_interval: 1m
  receiver: 'webhook'

最后用命令行测试一下发送到alertmanager的告警,能否最终发到企业微信上:

curl -X POST "http://9.220.xxx.xx/api/v2/alerts" -H "Content-type: application/json" -d '[{"status": "firing","labels": {"alertname": "name","service": "my-service","severity":"warning","instance": "name.example.net"},"annotations": {"summary": "High latency is high!"},	"generatorURL": "http://prometheus.int.example.net/<generating_expression>"}]'


触发了多次告警,成功!

posted on 2022-07-15 16:53  ahfuzhang  阅读(1668)  评论(0)    收藏  举报