Prometheus监控系统初级搭建
监控服务搭建
安装包下载
prometheus、alertmanager、blackbox、node_exporter
安装部署
安装完成后
prometheus服务部署
1、准备执行脚本
2、prometheus.service 注册服务脚本如下:
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/opt/prometheus/prometheus/prometheus --config.file=/opt/prometheus/prometheus/prometheus.yml --web.enable-lifecycle --web.external-url= --enable-feature=new-service-discovery-manager
[Install]
WantedBy=multi-user.target
3、install-promethes.sh 执行脚本如下【安装包版本改为自己的版本】:
#!/bin/bash
port=9090
usage () {
echo "USAGE: $0 --web.external-url 192.168.1.100"
echo "[-url|--web.external-url] 输入当前主机的ip"
}
if [[ $# -eq 0 ]]; then
usage
exit 0
fi
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
-url|--web.external-url)
url=$2
shift
shift
;;
-h|--help)
help="true"
shift
;;
*)
usage
exit 1
;;
esac
done
if [[ $help ]]; then
usage
exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./prometheus-2.31.1.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/prometheus
mv /opt/prometheus/prometheus-2.31.1.linux-amd64 /opt/prometheus/prometheus
cp ./prometheus.service /etc/systemd/system/prometheus.service
sed -i "s/--web.external-url=/--web.external-url=http://${url}:${port}/g" /etc/systemd/system/prometheus.service
systemctl daemon-reload && systemctl enable prometheus && systemctl start prometheus && systemctl status prometheus
4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-prometheus.sh
安装执行【ip换为服务器的IP】:./install-prometheus.sh --web.external-url
5、验证 浏览器输入 http://服务器IP:9090
alertmanager服务部署
1、准备执行脚本
2、alertmanager.service 注册服务脚本如下:
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/opt/prometheus/alertmanager/alertmanager --web.listen-address= --config.file=/opt/prometheus/alertmanager/alertmanager.yml
[Install]
WantedBy=multi-user.target
3、install-alertmanager.sh 执行脚本如下【安装包版本改为自己的版本】:
#!/bin/bash
port=9093
usage () {
echo "USAGE: $0 [--port 9093]"
echo "[-p|--port 9093]"
}
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
-p|--port)
url=$2
shift
shift
;;
-h|--help)
help="true"
shift
;;
*)
usage
exit 1
;;
esac
done
if [[ $help ]]; then
usage
exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./alertmanager-0.23.0.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/alertmanager
mv /opt/prometheus/alertmanager-0.23.0.linux-amd64 /opt/prometheus/alertmanager
cp ./alertmanager.service /etc/systemd/system/alertmanager.service
cp ./alertmanager.yml /opt/prometheus/alertmanager/
sed -i "s/--web.listen-address=/ --web.listen-address=:${port}/g" /etc/systemd/system/alertmanager.service
systemctl daemon-reload && systemctl enable alertmanager && systemctl start alertmanager && systemctl status alertmanager
4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-alertmanager.sh
安装执行:./install-alertmanager.sh
5、验证 浏览器输入 http://服务器IP:9093
blackbox_exporter服务部署
1、准备执行脚本
2、blackbox_exporter.service 注册服务脚本如下:
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/opt/prometheus/blackbox_exporter/blackbox_exporter --config.file=/opt/prometheus/blackbox_exporter/blackbox.yml --web.listen-address=:9115
[Install]
WantedBy=multi-user.target
3、install-blackbox-exporter.sh 执行脚本如下【安装包版本改为自己的版本】:
#!/bin/bash
port=9115
usage () {
echo "USAGE: $0 [--port 9115]"
echo "[-p|--port 9115]"
}
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
-p|--port)
url=$2
shift
shift
;;
-h|--help)
help="true"
shift
;;
*)
usage
exit 1
;;
esac
done
if [[ $help ]]; then
usage
exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./blackbox_exporter-0.19.0.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/blackbox_exporter
mv /opt/prometheus/blackbox_exporter-0.19.0.linux-amd64 /opt/prometheus/blackbox_exporter
cp ./blackbox_exporter.service /etc/systemd/system/blackbox_exporter.service
systemctl daemon-reload && systemctl enable blackbox_exporter && systemctl start blackbox_exporter && systemctl status blackbox_exporter
4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-blackbox-exporter.sh
安装执行:./install-blackbox-exporter.sh
5、验证 浏览器输入 http://服务器IP:9115
node_exporter服务部署
1、准备执行脚本
2、node_exporter.service 注册服务脚本如下:
[Unit]
Description=https://prometheus.io
[Service]
Restart=on-failure
ExecStart=/opt/prometheus/node_exporter/node_exporter --web.listen-address=:9100
[Install]
WantedBy=multi-user.target
3、install-node-exporter.sh 执行脚本如下【安装包版本改为自己的版本】:
#!/bin/bash
port=9100
usage () {
echo "USAGE: $0 [--port 9100]"
echo "[-p|--port 9100]"
}
while [[ $# -gt 0 ]]; do
key="$1"
case $key in
-p|--port)
url=$2
shift
shift
;;
-h|--help)
help="true"
shift
;;
*)
usage
exit 1
;;
esac
done
if [[ $help ]]; then
usage
exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./node_exporter-1.3.1.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/node_exporter
mv /opt/prometheus/node_exporter-1.3.1.linux-amd64 /opt/prometheus/node_exporter
cp ./node_exporter.service /etc/systemd/system/node_exporter.service
systemctl daemon-reload && systemctl enable node_exporter && systemctl start node_exporter && systemctl status node_exporter
4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-node-exporter.sh
安装执行:./install-node-exporter.sh
5、验证 浏览器输入 http://服务器IP:9100
grafana服务部署
采用docker部署
1、拉取镜像包
docker pull grafana/grafana:8.5.5
2、服务器上创建目录用于数据挂载
mkdir -p /opt/grafana
3、给创建的目录设置权限
chmod 777 -R /opt/grafana
4、使用命令启动容器
docker run -d --restart=always -it --name=grafana -p 31787:3000 -e GF_SECURITY_ALLOW_EMBEDDING=true -e GF_AUTH_PROXY_ENABLED=true -e GF_AUTH_ANONYMOUS_ENABLED=true -v /opt/grafana/:/var/lib/grafana grafana/grafana:8.5.5
可嵌套:GF_SECURITY_ALLOW_EMBEDDING=true
可代理:GF_AUTH_PROXY_ENABLED=true
可免密:GF_AUTH_ANONYMOUS_ENABLED=true
数据挂载:/opt/grafana/:/var/lib/grafana
5、验证【可免密登录】浏览器输入 http://服务器IP:31787
配置修改
blackbox_exporter.yml配置文件修改
配置参考【本文使用默认配置】
https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md
alertmanager.yml配置文件修改
配置参考
global:
resolve_timeout: 1m
# smtp配置
smtp_from: 'xxx@xxx.com' #发送邮箱地址
smtp_smarthost: 'smtp.163.com:465' #邮箱客户端发送服务器配置【可在邮箱设置中查看】
smtp_auth_username: 'xxx@xxx.com' #发送邮箱的登录账号
smtp_auth_password: 'xxxxxx' #发送邮箱的登录密码
smtp_require_tls: false #默认SMTP TLS要求,请注意,Go不支持到远程SMTP端点的未加密连接
route:
group_by: ['hostname'] #分组标签,可在prometheus配置文件中自定义标签
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'email'
receivers:
- name: 'email'
email_configs:
- to: 'xxx@qq.com' #接收人的邮箱地址
send_resolved: true
headers:
from: '警报中心'
subject: '报警邮件'
to: '运维'
inhibit_rules: #制定抑制匹配规则
- source_match:
level: '严重'
target_match:
level: '严重'
equal: ['hostname', 'alertname', 'dev', 'instance']
prometheus.yml配置文件修改
配置参考
https://prometheus.io/docs/prometheus/latest/configuration/configuration/
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
- 192.168.226.200:9093 #已部署的alertmanager服务地址
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
- "/opt/prometheus/rules/*.yml" #自定义的告警规则文件
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: "prometheus"
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ["192.168.226.200:9090"] #prometheus自身服务
- job_name: "alertmanager"
static_configs:
- targets: ["192.168.226.200:9093"] #alertmanager服务
- job_name: "node_exporter_linux"
static_configs:
- targets: ["192.168.226.200:9100"] #linux node_exporter采集器
- job_name: "node_exporter_windows"
static_configs:
- targets: ["192.168.24.187:9182"] #Windows node_exporter采集器
labels: #自定义标签及值
hostname: '宿主机服务器'
ip: '192.168.24.187'
- job_name: "http_icmp_1" #blackbox黑盒测试
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets: ["192.168.24.198"]
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.226.200:9115 #blackbox地址
自定义的告警规则示例:
groups:
- name: default
rules:
- alert: Windows主机内存占用过高
expr: 100 - ((windows_os_physical_memory_free_bytes / windows_cs_physical_memory_bytes) * 100) > 50
for: 5m
labels:
level: '严重'
metric: windows_os_physical_memory_free_bytes
annotations:
summary: "{{ $labels.hostname }}{{ $labels.ip }} 内存占用超过50%"
description: "Memory usage is more than 50%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: Windows主机磁盘空间占用过高
expr: 100.0 - 100 * windows_logical_disk_free_bytes{volume!~"^HarddiskVolum.."} / windows_logical_disk_size_bytes{volume!~"^HarddiskVolum.."} > 75
for: 5m
labels:
level: '严重'
metric: windows_logical_disk_free_bytes
annotations:
summary: "{{ $labels.hostname }}{{ $labels.ip }} 磁盘{{ $labels.volume }}占用超过75%"
description: "Disk usage is more than 75%\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
集成效果
Grafana可视化
grafana初始密码:admin/admin