20、Docker容器的监控–Prometheus

版权声明:原创作品,谢绝转载!否则将追究法律责任。 ————— 作者:kirin

zabbix ———— 监控———— k8s,docker
Prometheus ———— 监控 ———— k8s
exporter (采集器)

1.##上传文件至/opt/下面解压 官网下载地址

[root@docker03 /opt]#  tar xf prometheus-2.23.0.linux-amd64.tar.gz 
###觉得目录名字太长的话可以重命名一下
[root@docker03 /opt]#  mv prometheus-2.23.0.linux-amd64 prometheus

##查看启动参数
[root@docker03 /opt/prometheus]#  ./prometheus --help

##启动服务
[root@docker03 /opt/prometheus]#  ./prometheus --config.file="prometheus.yml" 

服务启动后默认是在前台运行的,夯住了,所以非常适合把它做成容器。
##那么不做成容器的话,怎么让它在后台运行呢?
在命令的后面加个&符号就可以了~~~
[root@docker03 /opt/prometheus]#  ./prometheus --config.file="prometheus.yml" &

##Prometheus有一个默认监听的端口是9090,可以打开浏览器访问下


#客户端节点上传镜像,导入镜像
[root@docker01 ~]#  docker load -i docker_monitor_node.tar.gz 

#启动node-exporter
[root@docker01 ~]# docker run -d   -p 9100:9100   -v "/:/host:ro,rslave"   --name=node_exporter   quay.io/prometheus/node-exporter   --path.rootfs /host

#启动cadvisor
[root@docker01 ~]# docker run --volume=/:/rootfs:ro  --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro  -p 8080:8080 -d --name=cadvisor google/cadvisor:latest

##浏览器测试  
http://10.0.0.11:8080/metrics
http://10.0.0.11:9100/metrics


2.#prometheus节点

1.先杀掉进程
[root@docker03 /opt/prometheus]#  netstat -lntup
[root@docker03 /opt/prometheus]#  kill 5018

2.修改配置文件
[root@docker03 /opt/prometheus]#  vim  prometheus.yml 

3.#启动Prometheus

[root@docker03 /opt/prometheus]#  ./prometheus --config.file="prometheus.yml" &

4.#打开浏览器刷新界面

http://10.0.0.13:9090/targets

5.每次改完配置文件,都需要重新启动Prometheus,因为上面的是静态配置,接下来我们把它改成动态配置

#重新编辑配置文件(先杀掉进程)
[root@docker03 /opt/prometheus]#  netstat -lntup
[root@docker03 /opt/prometheus]#  kill 5115

##编辑文件
[root@docker03 /opt/prometheus]#  vim  prometheus.yml 
[root@docker03 /opt/prometheus]#  cat prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'cadvisor'
    file_sd_configs:
      - files:
        - /opt/prometheus/discovery/discovery_cadvisor.yml
        refresh_interval: 10s 
  - job_name: 'node-exportre'
    file_sd_configs:
      - files:
        - /opt/prometheus/discovery/discovery_node-exportre.yml
        refresh_interval: 10s

------------------------------------------------------------------------------

##创建目录文件
[root@docker03 ~]#  mkdir /opt/prometheus/discovery/ -p
##编辑文件(创建文件。什么名字都可以,自己知道就行、对应上面的文件内容里的路径)
[root@docker03 ~]#  vim /opt/prometheus/discovery/discovery_cadvisor.yml
[root@docker03 ~]#  cat /opt/prometheus/discovery/discovery_cadvisor.yml
[
  {
     "targets":  ["10.0.0.11:8080"]
  }
]

[root@docker03 ~]#  vim /opt/prometheus/discovery/discovery_node-exportre.yml
[root@docker03 ~]#  cat /opt/prometheus/discovery/discovery_node-exportre.yml
[
  {
     "targets":  ["10.0.0.11:9100"]
  }
]

##重启Prometheus
[root@docker03 /opt/prometheus]#  ./prometheus --config.file="prometheus.yml" &
##打开浏览器看一下

##接下来演示它的动态效果
##修改配置文件
[root@docker03 ~]#  vim /opt/prometheus/discovery/discovery_cadvisor.yml
[root@docker03 ~]#  cat /opt/prometheus/discovery/discovery_cadvisor.yml
[
  {
     "targets":  ["10.0.0.11:8080","10.0.0.12:8080"]
  }
]

[root@docker03 ~]#  vim /opt/prometheus/discovery/discovery_node-exportre.yml
[root@docker03 ~]#  cat /opt/prometheus/discovery/discovery_node-exportre.yml
[
  {
     "targets":  ["10.0.0.11:9100","10.0.0.12:9100"]
  }
]

##接下来再去浏览器刷新查看

已经自动添加了监控地址,不用重新启动Prometheus。02机器上的容器没有启动,所以监控到的地址的宕机状态

6.alertmanager邮件报警

6.1.上传文件至opt下并解压
6.2.重命名一下
[root@docker03 /opt]#  mv alertmanager-0.21.0.linux-amd64 alertmanager

##进入alertmanager目录,备份alertmanager.yml文件
[root@docker03 /opt]#  cd alertmanager
[root@docker03 /opt/alertmanager]#  cp alertmanager.yml alertmanager.yml.bak

##编辑alertmanager.yml文件
[root@docker03 /opt/alertmanager]#  vim alertmanager.yml
[root@docker03 /opt/alertmanager]#  cat alertmanager.yml
global:
  resolve_timeout: 5m
  smtp_from: '1224256589@qq.com' ##收件人
  smtp_smarthost: 'smtp.qq.com:465'##QQ邮箱
  smtp_auth_username: '1224256589@qq.com'##邮箱名字
  smtp_auth_password: 'jgfcbjysohbliibb'##qq邮箱授权码
  smtp_require_tls: false
  smtp_hello: 'qq.com'
route:
  group_by: ['alertname']
  group_wait: 5s
  group_interval: 5s
  repeat_interval: 5m##多长时间收一次
  receiver: 'email'##定义用什么方式来收报警信息,可以定多个
receivers:
- name: 'email'##定义收件人
  email_configs:
  - to: '1224256589@qq.com'##收件人邮箱
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'##什么样的级别报警
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']##接收的标签

##启动alertmanager.yml
[root@docker03 /opt/alertmanager]#  ./alertmanager --config.file="alertmanager.yml" &

##编辑Prometheus的报警规则
[root@docker03 /opt/alertmanager]#  cd ../prometheus/
[root@docker03 /opt/prometheus]#  vim node-up.rules
[root@docker03 /opt/prometheus]#  cat node-up.rules
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node-exporter"} == 0 
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"

##编辑Prometheus的主配置文件
[root@docker03 /opt/prometheus]#  vim prometheus.yml 
[root@docker03 /opt/prometheus]#  cat prometheus.yml
# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
       - 10.0.0.13:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
   - "node-up.rules"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    static_configs:
    - targets: ['localhost:9090']
  - job_name: 'cadvisor'
    file_sd_configs:
      - files:
        - /opt/prometheus/discovery/discovery_cadvisor.yml
        refresh_interval: 10s 
  - job_name: 'node-exporter'
    file_sd_configs:
      - files:
        - /opt/prometheus/discovery/discovery_node-exportre.yml
        refresh_interval: 10s
-------------------------------------------------------------------------------
##接下来杀掉进程、重启Prometheus服务
[root@docker03 /opt/prometheus]#  ./prometheus --config.file="prometheus.yml" &

##接下来我们触发一下报警
##这里以01机器为例,停掉export的容器,然后打开浏览器,耐心等待
[root@docker01 ~]#  docker stop node_exporter

##接下来打开浏览器刷新
http://10.0.0.13:9090/alerts

当监控的站点真的连接不上以后,就会触发报警

打开邮箱

7.grafana对接Prometheus

1.上传下载好的压缩包至03机器上
2.yum安装
[root@docker03 ~]#  yum localinstall -y grafana-6.3.3-1.x86_64.rpm 

3.启动并加入开机自启
[root@docker03 ~]#  systemctl start grafana-server.service 
[root@docker03 ~]#  systemctl enable  grafana-server.service 

4.打开浏览器 10.0.0.13:3000  (grafan默认端口是3000)
##输入用户名和密码:admin

进去的主题是默认的黑色,改一改

点击首选项(不知道的话翻译下网页就知道了)

按照图片点击

最后就是白色主题了

5.创建数据源
##garfan默认支持Prometheus,所以不需要像zabbix那样装插件了





接下来去grafan官网搜索一个Dashboards。复制ID号码

然后返回grafan的页面,导入Dashboards




posted @ 2022-06-03 11:58  kirin(麒麟)  阅读(443)  评论(0)    收藏  举报
Δ