alertmanager相关
一、alertmanager收到的告警被置为[resolved]状态
1.问题描述
prometheus和alertmanager分别部署在不同集群,resolve_timeout配置为20m
查看alertmanager日志发现,prometheus发往alertmanager的告警全部被置为[resolved]状态
level=debug ts=2024-04-12T07:37:11.974Z caller=dispatch.go:165 component=dispatcher msg="Received alert" alert=POD内存使用率[f83426f][resolved]
2.问题解决
经过排查发现prometheus所在集群未开启时钟同步,导致和alertmanager所在集群存在时差,alertmanager则自动将超过resolve_timeout的告警置为[resolved]状态。
后面开启时钟同步后,问题解决。
二、alertmanager部分配置解释
global:
resolve_timeout: 10m #再十分钟内没有再次收到告警,则将告警置为resolve状态
route:
receiver: pero-api #告警接收者
group_by: #告警按照以下标签进行分组
- alertTarget
- alertLevel
- rule_id
- metricId
- cluster
group_wait: 5m #在接收到第一条告警(一条新告警)时,将告警发送给receiver之前需要等待的时间
group_interval: 5m #对于一条已经出现过的告警,间隔5分钟检查一次告警
repeat_interval: 2h #对于一条已经出现过的告警(没有resolve的),每隔2小时重新发送给receiver
###以上为所有route的默认配置,以下为各个route的具体配置###
routes:
- receiver: pero-api
group_by:
- alertTarget
- alertLevel
- rule_id
- metricId
- cluster
match:
peroGroup: pero-api
routes:
- match:
alertLevel: "4"
group_wait: 5s
group_interval: 5s
- match:
alertLevel: "3"
group_wait: 30s
group_interval: 30s
- match:
alertLevel: "2"
group_wait: 2m
group_interval: 2m
- match:
alertLevel: "1"
group_wait: 5m
group_interval: 5m
inhibit_rules: #抑制规则
- source_match: #告警组为pero-api的级别为4的告警会抑制告警组为pero-api的级别为3的告警,并且标签也要匹配上
alertLevel: "4"
peroGroup: pero-api
target_match:
alertLevel: "3"
peroGroup: pero-api
equal: ['peroGroup','metricId', 'cluster', 'rule_id', 'instance', 'alertTarget']
receivers: #给第三方应用发送告警信息
- name: pero-api
webhook_configs:
- send_resolved: true
url: http://pero-api-svc.pcl:8080/pero/api/v3/send #pero-api应用接口
三、手动给alertmanager发送告警
1.报文
[
{
"labels": {
"alertTarget": "xdd666",
"alertLevel": "1",
"rule_id": "98",
"instance": "xdd实例",
"bizSystem": "xdd轻微系统",
"alertname": "资源变更",
"groupId": "",
"log_info": "xdd轻微ddddddddddddddddddddddddddddddddddddddddddddd",
"description": "资源对象:xdd轻微ddddd变更",
"thirdStrategyId": "100",
"clusterId": "sdsdfs-wewew-dsewe-sdcsd-ssdvdd",
"log_time": "2024-11-04 16:16:16",
"metricType": "Deployment",
"rulesName": "xdd轻微告警名称",
"clusterName": "cluster-pero",
"namespace": "xdd轻微",
"alertGroup": "pero-api",
"alertModel": "resource" ####注意,新增字段或修改value("key": "value")会认为是新告警,会发送;但如果新增字段时value为空则不认为是新告警。
}, ####注意,value的值从有值变为空也会发送告警
"annotations": { ####有时又不一定,很玄学。随意修改value的值容易发生告警风暴
"alertContent": "xdd轻微ddddddddddddddddddddddddddddddddddddddddddddd变更"
}
},
{
"labels": {
"alertTarget": "xdd666",
"alertLevel": "2",
"rule_id": "97",
"instance": "xdd实例",
"bizSystem": "xdd中度系统",
"alertname": "资源变更",
"groupId": "",
"log_info": "xdd中度ddddddddddddddddddddddddddddddddddddddddddddd",
"description": "资源对象:xdd中度ddddd变更",
"thirdStrategyId": "100",
"clusterId": "sdsdfs-wewew-dsewe-sdcsd-ssdvdd",
"log_time": "2024-11-04 16:16:16",
"metricType": "Deployment",
"rulesName": "xdd中度告警名称",
"clusterName": "cluster-pero",
"namespace": "xdd中度",
"alertGroup": "pero-api",
"alertModel": "resource"
},
"annotations": {
"alertContent": "xdd中度ddddddddddddddddddddddddddddddddddddddddddddd变更"
}
}
]
2.curl命令
curl -v --request POST \ --url http://alertmanager-svc.pcl:9093/api/v2/alerts \ --header 'Authorization: Basic xxxxxxxxxxxxxxxxxxxxxxxxxx' \ --header 'content-type: application/json' \ --data '[ { "labels": { "alertTarget": "xdd666", "alertLevel": "1", "rule_id": "98", "instance": "xdd实例", "bizSystem": "xdd轻微系统", "alertname": "资源变更", "groupId": "", "log_info": "xdd轻微ddddddddddddddddddddddddddddddddddddddddddddd", "description": "资源对象:xdd轻微ddddd变更", "thirdStrategyId": "100", "clusterId": "sdsdfs-wewew-dsewe-sdcsd-ssdvdd", "log_time": "2024-11-04 16:16:16", "metricType": "Deployment", "rulesName": "xdd轻微告警名称", "clusterName": "cluster-pero", "namespace": "xdd轻微", "alertGroup": "pero-api", "alertModel": "resource" }, "annotations": { "alertContent": "xdd轻微ddddddddddddddddddddddddddddddddddddddddddddd变更" } }, { "labels": { "alertTarget": "xdd666", "alertLevel": "2", "rule_id": "97", "instance": "xdd实例", "bizSystem": "xdd中度系统", "alertname": "资源变更", "groupId": "", "log_info": "xdd中度ddddddddddddddddddddddddddddddddddddddddddddd", "description": "资源对象:xdd中度ddddd变更", "thirdStrategyId": "100", "clusterId": "sdsdfs-wewew-dsewe-sdcsd-ssdvdd", "log_time": "2024-11-04 16:16:16", "metricType": "Deployment", "rulesName": "xdd中度告警名称", "clusterName": "cluster-pero", "namespace": "xdd中度", "alertGroup": "pero-api", "alertModel": "resource" }, "annotations": { "alertContent": "xdd中度ddddddddddddddddddddddddddddddddddddddddddddd变更" } } ] '
四、alertmanager集群告警抑制不生效
1.场景
抑制规则:高级别告警抑制低级别告警,
alertmanger部署模式:alertmanager集群,通过负载均衡地址访问
发送告警时,alertmanager地址是一个负载均衡地址,可能有一部分告警发送给了alertmanager集群中的某个节点,另一部分发送给了alertmanager集群中其他节点。
这就会导致只有同一个节点的alermanager收到的告警会触发抑制规则,不同节点的告警不会互相抑制
2.解决
发送告警时,给每个alertmanager的节点都发送相同的告警

浙公网安备 33010602011771号