20、Docker容器的监控–Prometheus
版权声明:原创作品,谢绝转载!否则将追究法律责任。 ————— 作者:kirin
zabbix ———— 监控———— k8s,docker
Prometheus ———— 监控 ———— k8s
exporter (采集器)
1.##上传文件至/opt/下面解压 官网下载地址
[root@docker03 /opt]# tar xf prometheus-2.23.0.linux-amd64.tar.gz
###觉得目录名字太长的话可以重命名一下
[root@docker03 /opt]# mv prometheus-2.23.0.linux-amd64 prometheus

##查看启动参数
[root@docker03 /opt/prometheus]# ./prometheus --help
##启动服务
[root@docker03 /opt/prometheus]# ./prometheus --config.file="prometheus.yml"

服务启动后默认是在前台运行的,夯住了,所以非常适合把它做成容器。
##那么不做成容器的话,怎么让它在后台运行呢?
在命令的后面加个&符号就可以了~~~
[root@docker03 /opt/prometheus]# ./prometheus --config.file="prometheus.yml" &
##Prometheus有一个默认监听的端口是9090,可以打开浏览器访问下


#客户端节点上传镜像,导入镜像
[root@docker01 ~]# docker load -i docker_monitor_node.tar.gz
#启动node-exporter
[root@docker01 ~]# docker run -d -p 9100:9100 -v "/:/host:ro,rslave" --name=node_exporter quay.io/prometheus/node-exporter --path.rootfs /host
#启动cadvisor
[root@docker01 ~]# docker run --volume=/:/rootfs:ro --volume=/var/run:/var/run:rw --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro -p 8080:8080 -d --name=cadvisor google/cadvisor:latest

##浏览器测试
http://10.0.0.11:8080/metrics
http://10.0.0.11:9100/metrics


2.#prometheus节点
1.先杀掉进程
[root@docker03 /opt/prometheus]# netstat -lntup
[root@docker03 /opt/prometheus]# kill 5018
2.修改配置文件
[root@docker03 /opt/prometheus]# vim prometheus.yml

3.#启动Prometheus
[root@docker03 /opt/prometheus]# ./prometheus --config.file="prometheus.yml" &
4.#打开浏览器刷新界面

5.每次改完配置文件,都需要重新启动Prometheus,因为上面的是静态配置,接下来我们把它改成动态配置
#重新编辑配置文件(先杀掉进程)
[root@docker03 /opt/prometheus]# netstat -lntup
[root@docker03 /opt/prometheus]# kill 5115
##编辑文件
[root@docker03 /opt/prometheus]# vim prometheus.yml
[root@docker03 /opt/prometheus]# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
file_sd_configs:
- files:
- /opt/prometheus/discovery/discovery_cadvisor.yml
refresh_interval: 10s
- job_name: 'node-exportre'
file_sd_configs:
- files:
- /opt/prometheus/discovery/discovery_node-exportre.yml
refresh_interval: 10s
------------------------------------------------------------------------------
##创建目录文件
[root@docker03 ~]# mkdir /opt/prometheus/discovery/ -p
##编辑文件(创建文件。什么名字都可以,自己知道就行、对应上面的文件内容里的路径)
[root@docker03 ~]# vim /opt/prometheus/discovery/discovery_cadvisor.yml
[root@docker03 ~]# cat /opt/prometheus/discovery/discovery_cadvisor.yml
[
{
"targets": ["10.0.0.11:8080"]
}
]
[root@docker03 ~]# vim /opt/prometheus/discovery/discovery_node-exportre.yml
[root@docker03 ~]# cat /opt/prometheus/discovery/discovery_node-exportre.yml
[
{
"targets": ["10.0.0.11:9100"]
}
]
##重启Prometheus
[root@docker03 /opt/prometheus]# ./prometheus --config.file="prometheus.yml" &
##打开浏览器看一下

##接下来演示它的动态效果
##修改配置文件
[root@docker03 ~]# vim /opt/prometheus/discovery/discovery_cadvisor.yml
[root@docker03 ~]# cat /opt/prometheus/discovery/discovery_cadvisor.yml
[
{
"targets": ["10.0.0.11:8080","10.0.0.12:8080"]
}
]
[root@docker03 ~]# vim /opt/prometheus/discovery/discovery_node-exportre.yml
[root@docker03 ~]# cat /opt/prometheus/discovery/discovery_node-exportre.yml
[
{
"targets": ["10.0.0.11:9100","10.0.0.12:9100"]
}
]
##接下来再去浏览器刷新查看

已经自动添加了监控地址,不用重新启动Prometheus。02机器上的容器没有启动,所以监控到的地址的宕机状态
6.alertmanager邮件报警
6.1.上传文件至opt下并解压
6.2.重命名一下
[root@docker03 /opt]# mv alertmanager-0.21.0.linux-amd64 alertmanager
##进入alertmanager目录,备份alertmanager.yml文件
[root@docker03 /opt]# cd alertmanager
[root@docker03 /opt/alertmanager]# cp alertmanager.yml alertmanager.yml.bak
##编辑alertmanager.yml文件
[root@docker03 /opt/alertmanager]# vim alertmanager.yml
[root@docker03 /opt/alertmanager]# cat alertmanager.yml
global:
resolve_timeout: 5m
smtp_from: '1224256589@qq.com' ##收件人
smtp_smarthost: 'smtp.qq.com:465'##QQ邮箱
smtp_auth_username: '1224256589@qq.com'##邮箱名字
smtp_auth_password: 'jgfcbjysohbliibb'##qq邮箱授权码
smtp_require_tls: false
smtp_hello: 'qq.com'
route:
group_by: ['alertname']
group_wait: 5s
group_interval: 5s
repeat_interval: 5m##多长时间收一次
receiver: 'email'##定义用什么方式来收报警信息,可以定多个
receivers:
- name: 'email'##定义收件人
email_configs:
- to: '1224256589@qq.com'##收件人邮箱
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'##什么样的级别报警
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']##接收的标签
##启动alertmanager.yml
[root@docker03 /opt/alertmanager]# ./alertmanager --config.file="alertmanager.yml" &

##编辑Prometheus的报警规则
[root@docker03 /opt/alertmanager]# cd ../prometheus/
[root@docker03 /opt/prometheus]# vim node-up.rules
[root@docker03 /opt/prometheus]# cat node-up.rules
groups:
- name: node-up
rules:
- alert: node-up
expr: up{job="node-exporter"} == 0
for: 15s
labels:
severity: 1
team: node
annotations:
summary: "{{ $labels.instance }} 已停止运行超过 15s!"
##编辑Prometheus的主配置文件
[root@docker03 /opt/prometheus]# vim prometheus.yml
[root@docker03 /opt/prometheus]# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 10.0.0.13:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "node-up.rules"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'cadvisor'
file_sd_configs:
- files:
- /opt/prometheus/discovery/discovery_cadvisor.yml
refresh_interval: 10s
- job_name: 'node-exporter'
file_sd_configs:
- files:
- /opt/prometheus/discovery/discovery_node-exportre.yml
refresh_interval: 10s
-------------------------------------------------------------------------------
##接下来杀掉进程、重启Prometheus服务
[root@docker03 /opt/prometheus]# ./prometheus --config.file="prometheus.yml" &

##接下来我们触发一下报警
##这里以01机器为例,停掉export的容器,然后打开浏览器,耐心等待
[root@docker01 ~]# docker stop node_exporter
##接下来打开浏览器刷新
http://10.0.0.13:9090/alerts

当监控的站点真的连接不上以后,就会触发报警

打开邮箱

7.grafana对接Prometheus
1.上传下载好的压缩包至03机器上
2.yum安装
[root@docker03 ~]# yum localinstall -y grafana-6.3.3-1.x86_64.rpm
3.启动并加入开机自启
[root@docker03 ~]# systemctl start grafana-server.service
[root@docker03 ~]# systemctl enable grafana-server.service
4.打开浏览器 10.0.0.13:3000 (grafan默认端口是3000)
##输入用户名和密码:admin

进去的主题是默认的黑色,改一改

点击首选项(不知道的话翻译下网页就知道了)

按照图片点击

最后就是白色主题了

5.创建数据源
##garfan默认支持Prometheus,所以不需要像zabbix那样装插件了





接下来去grafan官网搜索一个Dashboards。复制ID号码

然后返回grafan的页面,导入Dashboards




本文来自博客园,作者:kirin(麒麟),转载请注明原文链接:https://www.cnblogs.com/kirin365/articles/16137830.html

浙公网安备 33010602011771号