一、应用程序
1、Docker 启动 Prometheus

 需要持续化挂载

docker run -dit --name=prometheus  --mount src=prometheus-vol,dst=/etc/prometheus -p 9090:9090 -v /root/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml -v /root/prometheus/rules/:/usr/local/prometheus/rules/ prom/prometheus:latest
dokcer inspect prometheus-vol
vim prometheus.yml

global:
  scrape_interval:     15s 
  evaluation_interval: 15s 
  # scrape_timeout is set to the global default (10s).
        
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 12.27.65.51:9093
rule_files:
  - "/usr/local/prometheus/rules/*.rules"
scrape_configs:
  - job_name: 'prometheus'
    static_configs:
    - targets: ['12.27.65.51:9090']
      labels:
        instance: prometheus
        service: prometheus-service

  - job_name: 'node-exporter'

    static_configs:
    - targets: ['12.27.65.51:9100']
      labels:
        instance: node-exporter
        service: node-exporter-service
启动prometheus将yml文件导入
docker run -d --name prometheus -p 9090:9090 -v /root/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus:latest
2、Docker 启动 node-exporter
同样采用 Docker 方式启动 node-exporter 服务,最简单的启动命令如下
docker run --name node-exporter -d -p 9100:9100 prom/node-exporter:latest
3、启动并配置 AlertManager
$ docker run --name alertmanager -d -p 9093:9093 prom/alertmanager:latest
vim alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_from: 'xxxxxxxx@163.com'
  smtp_smarthost: 'smtp.163.com:465'
  smtp_auth_username: '@163.com'
  smtp_auth_password: 'xxxxxxxxx'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: 'xxxxxxxx@163.com'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
global: 全局配置,包括报警解决后的超时时间、SMTP 相关配置、各种渠道通知的 API 地址等等。
route: 用来设置报警的分发策略,它是一个树状结构,按照深度优先从左向右的顺序进行匹配。
receivers: 配置告警消息接受者信息,例如常用的 email、wechat、slack、webhook 等消息通知方式。
inhibit_rules: 抑制规则配置,当存在与另一组匹配的警报(源)时,抑制规则将禁用与一组匹配的警报
smtp_smarthost: 这里为 QQ 邮箱 SMTP 服务地址,官方地址为 smtp.qq.com 端口为 465 或 587,同时要设置开启 POP3/SMTP 服务。
smtp_auth_password: 这里为第三方登录 QQ 邮箱的授权码,非 QQ 账户登录密码,否则会报错,获取方式在 QQ 邮箱服务端设置开启 POP3/SMTP 服务时会提示。
smtp_require_tls: 是否使用 tls,根据环境不同,来选择开启和关闭。如果提示报错 email.loginAuth failed: 530 Must issue a STARTTLS command first,那么就需要设置为 true。着重说明一下,如果开启了 tls,提示报错 starttls failed: x509: certificate signed by unknown authority,需要在 email_configs 下配置 insecure_skip_verify: true 来跳过 tls 验证。
4、部署alertmanager
docker run -d
--name alertmanager
 -p 9093:9093 
 -v /root/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml
 prom/alertmanager:latest
5、Prometheus 配置 AlertManager 告警规则
mkdir -p /root/prometheus/rules && cd /root/prometheus/rules/

vim node-up.rules
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node-exporter"} == 0
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"
说明一下:该 rules 目的是监测 node 是否存活,expr 为 PromQL 表达式验证特定节点 job="node-exporter" 是否活着,for 表示报警状态为 Pending 后等待 15s 变成 Firing 状态,一旦变成 Firing 状态则将报警发送到 AlertManager,labels 和 annotations 对该 alert 添加更多的标识说明信息,所有添加的标签注解信息,以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中
然后,修改
prometheus.yml 配置文件,添加 rules 规则文件
...
# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 172.30.12.39:9093

rule_files:
  - "/usr/local/prometheus/rules/*.rules"
修改 Prometheus 启动命令如下,并重启服务
docker run --name prometheus -d -p 9090:9090 
	-v /root/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml 
	-v /root/prometheus/groups/:/usr/local/prometheus/groups/ 
	-v /root/prometheus/rules/:/usr/local/prometheus/rules/ 
	prom/prometheus:latest
6、AlertManager 配置自定义邮件模板
mkdir -p /root/prometheus/alertmanager-tmpl && cd /root/prometheus/alertmanager-tmpl
vim email.tmpl
{{ define "email.from" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to" }}xxxxxxxx@qq.com{{ end }}
{{ define "email.to.html" }}
{{ range .Alerts }}
=========start==========<br>
告警程序: prometheus_alert <br>
告警级别: {{ .Labels.severity }} 级 <br>
告警类型: {{ .Labels.alertname }} <br>
故障主机: {{ .Labels.instance }} <br>
告警主题: {{ .Annotations.summary }} <br>
告警详情: {{ .Annotations.description }} <br>
触发时间: {{ .StartsAt.Format "2019-08-04 16:58:15" }} <br>
=========end==========<br>
{{ end }}
{{ end }}
修改alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_from: '{{ template "email.from" . }}'
  smtp_smarthost: 'smtp.qq.com:465'
  smtp_auth_username: '{{ template "email.from" . }}'
  smtp_auth_password: 'DCMTWTIHBENBLXCC'
  smtp_require_tls: false
  smtp_hello: 'qq.com'
templates:
  - '/etc/alertmanager-tmpl/email.tmpl'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: '{{ template "email.to" . }}'
    html: '{{ template "email.to.html" . }}'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']
重新启动
docker run -d
 --name alertmanager
 -p 9093:9093 
 -v /root/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml
 -v /root/prometheus/alertmanager-tmpl/:/etc/alertmanager-tmpl/
    prom/alertmanager:latest
posted on 2021-12-27 17:34  属于我的梦,明明还在  阅读(670)  评论(0)    收藏  举报