prometheus部署在内网服务器

一、prometheus安装
-------------------------------------------------------------------------------------------
所需组件的作用如下：

- Prometheus server：普罗米修斯的主服务器（端口：9090）；
- NodeEXporter：负责收集Host硬件信息和操作系统信息，（端口：9100）；
- cAdvisor：负责收集Host上运行的容器信息（端口：8080）；
- Grafana：负责展示普罗米修斯监控界面（3000）；
- Alertmanager：用来接收Prometheus发送的报警信息，并且执行设置好的报警方式，报警内容（同样也是在dockerA主机上部署，端口：9093）；

systemctl enable prometheus.service
systemctl enable alertmanager
systemctl enable node_exporter.service
systemctl enable grafana-server.service
systemctl enable PrometheusAlert

1 下载
https://prometheus.io/download

2 创建用户并授权

[root@ntp1 src]# groupadd prometheus
[root@ntp1 src]# useradd -g prometheus -s /sbin/nologin prometheus
[root@ntp1 src]# tar -zxvf prometheus-2.18.1.linux-amd64.tar.gz -C /usr/local/
[root@ntp1 local]# mv prometheus-2.18.1.linux-amd64/ prometheus
[root@ntp1 local]# cd prometheus/
[root@ntp1 prometheus]# mkdir {data,logs,conf,rules} -p
[root@ntp1 prometheus]# chown -R prometheus.prometheus *

3 将Prometheus配置为系统服务
vim /etc/systemd/system/prometheus.service

[Unit]
Description=Prometheus
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/prometheus/prometheus --config.file=/usr/local/prometheus/prometheus.yml --storage.tsdb.path=/usr/local/prometheus/data

Restart=on-failure

[Install]
WantedBy=multi-user.target

4、启动服务

systemctl daemon-reload
systemctl enable prometheus.service
systemctl restart prometheus.service

二、安装 alertmanager
-------------------------------------------------------------------------------------------
[root@ntp1 src]# tar -zxvf alertmanager-0.20.0.linux-amd64.tar.gz -C /usr/local/
[root@ntp1 local]# mv alertmanager-0.20.0.linux-amd64/ alertmanager

[root@ntp1 local]# vim /etc/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
After=network-online.target

[Service]
Restart=on-failure
ExecStart=/usr/local/alertmanager/alertmanager --config.file=/usr/local/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target

[root@ntp1 local]# systemctl daemon-reload
[root@ntp1 local]# systemctl start alertmanager
[root@ntp1 local]# systemctl status alertmanager
[root@ntp1 local]# systemctl enable alertmanager
[root@ntp1 local]# netstat -nltup|grep 9093
tcp6 0 0 :::9093 :::* LISTEN 10597/alertmanager

[root@localhost alertmanager]# cat alertmanager.yml
global:
resolve_timeout: 5m
route:
group_by: ['instance'] #可以机器标签进行报警的分组
group_wait: 1s ##分组等待时间
group_interval: 10s #分组的时间间隔
repeat_interval: 5m #重复报警的时间间隔
receiver: 'web.hook.prometheusalert'
routes:
- receiver: 'prometheusalert-dingding'
# group_wait: 10m
match:
level: '2'
receivers:
- name: 'web.hook.prometheusalert'
webhook_configs:
- url: 'http://localhost:8080/prometheus/alert'
- name: 'prometheusalert-dingding'
webhook_configs:
- url: 'http://localhost:8080/prometheus/router?ddurl=https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxx'
-------------------------------------------------------------------------------------------

三、node_exporter安装及配置
-------------------------------------------------------------------------------------------
1、下载及解压安装包、授权

wget https://github.com/prometheus/node_exporter/releases/download/v0.17.0/node_exporter-0.17.0.linux-amd64.tar.gz

[root@localhost local]# tar -zxvf node_exporter-0.17.1.linux-amd64.tar.gz -C /usr/local/
[root@localhost local]# mv node_exporter-0.17.1.linux-amd64/ node_exporter

[root@localhost prometheus]# chown -R prometheus.prometheus node_exporter

2、创建node_exporter.service的 systemd unit 文件

# vim /usr/lib/systemd/system/node_exporter.service

[Unit]
Description=node_exporter
Documentation=https://prometheus.io/
After=network.target

[Service]
Type=simple
User=root
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target

3、启动服务

systemctl daemon-reload
systemctl enable node_exporter.service
systemctl start node_exporter.service

4、客户监控端数据汇报:&& grafana展示的数据取值这里

访问：http://192.168.100.205:9100/metrics，查看从exporter具体能抓到的数据.

5、部署客户端加入监控

5.1 在客户端安装agent

[root@dockerhome src]# tar -zxvf node_exporter-0.17.1.linux-amd64.tar.gz -C /usr/local/
[root@dockerhome local]# mv node_exporter-0.18.1.linux-amd64/ node_exporter
#vim /etc/systemd/system/node_exporter.service
[Unit]
Description=mysql_exporter
After=network.target
[Service]
Type=simple
User=prometheus
ExecStart=/usr/local/node_exporter/node_exporter
Restart=on-failure
[Install]
WantedBy=multi-user.target

设置用户
groupadd prometheus

useradd -g prometheus -s /sbin/nologin prometheus

chown -R prometheus:prometheus /usr/local/node_exporter/

[root@dockerhome node_exporter]# groupadd prometheus
[root@dockerhome node_exporter]# useradd -g prometheus -s /sbin/nologin prometheus
[root@dockerhome node_exporter]# chown -R prometheus:prometheus /usr/local/node_exporter/
[root@dockerhome node_exporter]# systemctl daemon-reload
[root@dockerhome node_exporter]# systemctl restart node_exporter
[root@dockerhome node_exporter]# firewall-cmd --add-port=9100/tcp --permanent
success
[root@dockerhome node_exporter]# firewall-cmd --reload
success

-------------------------------------------------------------------------------------------
四、Grafana安装及配置
-------------------------------------------------------------------------------------------
1、下载及安装
wget https://dl.grafana.com/oss/release/grafana-6.7.3-1.x86_64.rpm
yum localinstall grafana-6.7.3-1.x86_64.rpm

2、启动服务
systemctl daemon-reload
systemctl enable grafana-server.service
systemctl start grafana-server.service

3、访问WEB界面 http://ip:3000

默认账号/密码：admin/admin

4、Grafana添加数据源
在登陆首页，点击"Configuration-Data Sources"按钮，跳转到添加数据源页面，配置如下：
Name: prometheus
Type: prometheus
URL: http://192.168.100.205:9090/
Access: Server
取消Default的勾选，其余默认，点击"Add"，如下：
5、导入模版
-------------------------------------------------------------------------------------------

五、 prometheus 配置连通 alertmanager 添加监控主机，配置告警规则
5.1、配置文件
[root@localhost prometheus]# cat prometheus.yml
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
scrape_timeout: 10s
# scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
- 'localhost:9093' #配置连通alertmanager

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
- "/usr/local/prometheus/rules/rule*.yml" #配置告警规则目录文件
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- file_sd_configs:
- files:
- 'conf/host.yml' # 配置 node_exporter 要收集信息的主机列表
refresh_interval: 10s
job_name: Host
metrics_path: /metrics
relabel_configs:
- source_labels: [__address__]
regex: (.*)
target_label: instance
replacement: $1
- source_labels: [__address__]
regex: (.*)
target_label: __address__
replacement: $1:9100

[root@localhost rules]# cat rule_host.yml
groups:
- name: 主机状态-监控告警
rules:
- alert: 主机状态
expr: up == 0
for: 5s
labels:
status: 非常严重
annotations:
summary: "{{$labels.instance}} 服务器宕机"
description: "{{$labels.instance}} 服务器宕机超过3分钟"

- alert: CPU使用情况
expr: (100 - (avg by (instance)(irate(node_cpu_seconds_total{mode="idle"}[5m])) *100)) > 80
for: 3s
labels:
status: 一般告警
annotations:
summary: "{{$labels.instance}} CPU使用率过高！"
description: "{{$labels.instance }} CPU使用大于80% (目前使用:{{$value}}%)"
- alert: 内存使用
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 80
for: 3s
labels:
status: 严重告警
annotations:
summary: "{{$labels.instance}} 内存使用率过高！"
description: "{{$labels.instance }} 内存使用大于80%(目前使用:{{$value}}%)"
- alert: 磁盘容量
expr: (1-(node_filesystem_free_bytes{fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"})) * 100 > 80
for: 3s
labels:
status: 严重告警
annotations:
summary: "{{$labels.instance}} {{$labels.mountpoint}} 磁盘分区使用率过高！"
description: "{{$labels.instance}} {{$labels.mountpoint }} 磁盘分区使用大于80%(目前使用:{{$value}}%)"

- alert: IO性能
expr: avg(irate(node_disk_io_time_seconds_total[3m])) by(instance) * 100 > 80
for: 3s
labels:
status: 严重告警
annotations:
summary: "{{$labels.instance}} 流入磁盘IO使用率过高！"
description: "{{$labels.instance }} 流入磁盘IO大于80% (目前使用:{{$value}})"

[root@localhost prometheus]# cat conf/host.yml
- labels:
service: autofind
targets:
- 172.18.240.18
- 192.168.1.202
- 192.168.1.203
- 172.18.240.52
- 172.18.240.99

5.2 服务器端添加被监控主机IP
[root@ops001 prometheus-2.4.3.linux-amd64]# vim conf/host.yml
- labels:
service: autofind
targets:
- 172.18.240.18
- 192.168.1.202
- 172.18.240.52

5.3 查看效果
http://192.168.1.203:3000/

5.4 配置监控规则
// 配置告警规则，如果主机 down 了，就触发告警
[root@localhost prometheus]# vi rules/rule_host.yml

-------------------------------------------------------------------------------------------
六、报警设置
https://github.com/feiyu563/PrometheusAlert

6.1、创建服务

[root@localhost linux]# tar -zxvf PrometheusAlert.tar.gz -C /usr/local/
[root@localhost linux]# vim /etc/systemd/system/PrometheusAlert.service
[Unit]
Description=PrometheusAlert
After=network-online.target

[Service]
Type=simple
User=prometheus
Restart=on-failure
WorkingDirectory=/usr/local/PrometheusAlert/PrometheusAlert/example/linux/
ExecStart=/usr/local/PrometheusAlert/PrometheusAlert/example/linux/PrometheusAlert

[Install]
WantedBy=multi-user.target

6.2、修改配置文件
#---------------------↓webhook-----------------------
#是否开启钉钉告警通道,可同时开始多个通道0为关闭,1为开启
open-dingding=1
#默认钉钉机器人地址
ddurl=https://oapi.dingtalk.com/robot/send?access_token=7dab8205a446c43f9xxxxxxxxxxxxxxx
#是否开启 @所有人(0为关闭,1为开启)
dd_isatall=0

-----------------------------------------------------------------------------------------

测试
curl -H "Content-Type: application/json" -d '{"msgtype":"text","text":{"content":"prometheus alert test"}}' https://oapi.dingtalk.com/robot/send?access_token=7dab8205a446c43f9f9eaef33d6fb66ddffa07d1f0e6df8da9a0936ea5e9f798

七、总结
-------------------------------------------------------------------------------------------

[root@localhost local]# systemctl restart alertmanager
[root@localhost local]# systemctl restart prometheus-webhook-dingtalk
[root@localhost local]# systemctl restart prometheus
[root@localhost local]# systemctl restart grafana-server
[root@localhost local]# systemctl restart node_exporter

posted @ 2020-06-01 16:57 db小白阅读(1414) 评论(0) 收藏举报

刷新页面返回顶部

prometheus部署在内网服务器

公告