Prometheus监控系统初级搭建

监控服务搭建

安装包下载

prometheus、alertmanager、blackbox、node_exporter

https://prometheus.io/download/

安装部署

安装完成后

prometheus服务部署

1、准备执行脚本

2、prometheus.service 注册服务脚本如下:

[Unit]
Description=https://prometheus.io
  
[Service]
Restart=on-failure
ExecStart=/opt/prometheus/prometheus/prometheus --config.file=/opt/prometheus/prometheus/prometheus.yml --web.enable-lifecycle --web.external-url= --enable-feature=new-service-discovery-manager

[Install]                      
WantedBy=multi-user.target

3、install-promethes.sh 执行脚本如下【安装包版本改为自己的版本】:

#!/bin/bash
port=9090
usage () {
        echo "USAGE: $0 --web.external-url 192.168.1.100"       
        echo "[-url|--web.external-url] 输入当前主机的ip"
}
if [[ $# -eq 0 ]]; then
        usage
        exit 0
fi 
while [[ $# -gt 0 ]]; do
        key="$1"
        case $key in
                -url|--web.external-url)
                url=$2
                shift
                shift
                ;;
                -h|--help)
                help="true"
                shift
                ;;
                *)
                usage
                exit 1
                ;;
        esac
done
if [[ $help ]]; then
    usage
    exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./prometheus-2.31.1.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/prometheus
mv /opt/prometheus/prometheus-2.31.1.linux-amd64 /opt/prometheus/prometheus
cp ./prometheus.service /etc/systemd/system/prometheus.service
sed -i "s/--web.external-url=/--web.external-url=http://${url}:${port}/g" /etc/systemd/system/prometheus.service
systemctl daemon-reload && systemctl enable prometheus && systemctl start prometheus && systemctl status prometheus

4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-prometheus.sh
安装执行【ip换为服务器的IP】:./install-prometheus.sh --web.external-url

5、验证 浏览器输入 http://服务器IP:9090

alertmanager服务部署

1、准备执行脚本

2、alertmanager.service 注册服务脚本如下:

[Unit]
Description=https://prometheus.io

[Service]
Restart=on-failure
ExecStart=/opt/prometheus/alertmanager/alertmanager --web.listen-address= --config.file=/opt/prometheus/alertmanager/alertmanager.yml

[Install]
WantedBy=multi-user.target

3、install-alertmanager.sh 执行脚本如下【安装包版本改为自己的版本】:

#!/bin/bash
port=9093
usage () {
        echo "USAGE: $0 [--port 9093]"  
        echo "[-p|--port 9093]"
}
while [[ $# -gt 0 ]]; do
        key="$1"
        case $key in
                -p|--port)
                url=$2
                shift
                shift
                ;;
                -h|--help)
                help="true"
            shift
                ;;
                *)
                usage
                exit 1
                ;;
        esac
done
if [[ $help ]]; then
    usage
    exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./alertmanager-0.23.0.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/alertmanager
mv /opt/prometheus/alertmanager-0.23.0.linux-amd64 /opt/prometheus/alertmanager
cp ./alertmanager.service /etc/systemd/system/alertmanager.service
cp ./alertmanager.yml /opt/prometheus/alertmanager/
sed -i "s/--web.listen-address=/ --web.listen-address=:${port}/g" /etc/systemd/system/alertmanager.service
systemctl daemon-reload && systemctl enable alertmanager && systemctl start alertmanager && systemctl status alertmanager

4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-alertmanager.sh
安装执行:./install-alertmanager.sh

5、验证 浏览器输入 http://服务器IP:9093

blackbox_exporter服务部署

1、准备执行脚本

2、blackbox_exporter.service 注册服务脚本如下:

[Unit]
Description=https://prometheus.io

[Service]
Restart=on-failure
ExecStart=/opt/prometheus/blackbox_exporter/blackbox_exporter --config.file=/opt/prometheus/blackbox_exporter/blackbox.yml --web.listen-address=:9115

[Install]
WantedBy=multi-user.target

3、install-blackbox-exporter.sh 执行脚本如下【安装包版本改为自己的版本】:

#!/bin/bash
port=9115
usage () {
        echo "USAGE: $0 [--port 9115]"  
        echo "[-p|--port 9115]"
}
while [[ $# -gt 0 ]]; do
        key="$1"
        case $key in
                -p|--port)
                url=$2
                shift
                shift
                ;;
                -h|--help)
                help="true"
            shift
                ;;
                *)
                usage
                exit 1
                ;;
        esac
done
if [[ $help ]]; then
    usage
    exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./blackbox_exporter-0.19.0.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/blackbox_exporter
mv /opt/prometheus/blackbox_exporter-0.19.0.linux-amd64 /opt/prometheus/blackbox_exporter
cp ./blackbox_exporter.service /etc/systemd/system/blackbox_exporter.service
systemctl daemon-reload && systemctl enable blackbox_exporter && systemctl start blackbox_exporter && systemctl status blackbox_exporter

4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-blackbox-exporter.sh
安装执行:./install-blackbox-exporter.sh

5、验证 浏览器输入 http://服务器IP:9115

node_exporter服务部署

1、准备执行脚本

2、node_exporter.service 注册服务脚本如下:

[Unit]
Description=https://prometheus.io

[Service]
Restart=on-failure
ExecStart=/opt/prometheus/node_exporter/node_exporter --web.listen-address=:9100

[Install]
WantedBy=multi-user.target

3、install-node-exporter.sh 执行脚本如下【安装包版本改为自己的版本】:

#!/bin/bash
port=9100
usage () {
        echo "USAGE: $0 [--port 9100]"  
        echo "[-p|--port 9100]"
}
while [[ $# -gt 0 ]]; do
        key="$1"
        case $key in
                -p|--port)
                url=$2
                shift
                shift
                ;;
                -h|--help)
                help="true"
            shift
                ;;
                *)
                usage
                exit 1
                ;;
        esac
done
if [[ $help ]]; then
    usage
    exit 0
fi
mkdir -p /opt/prometheus
tar -zxvf ./node_exporter-1.3.1.linux-amd64.tar.gz -C /opt/prometheus/
rm -rf /opt/prometheus/node_exporter
mv /opt/prometheus/node_exporter-1.3.1.linux-amd64 /opt/prometheus/node_exporter
cp ./node_exporter.service /etc/systemd/system/node_exporter.service
systemctl daemon-reload && systemctl enable node_exporter && systemctl start node_exporter && systemctl status node_exporter

4、执行安装
将准备好的脚本放到服务器上,如:/usr/local/software/prometheus
可执行文件授权:chmod +x install-node-exporter.sh
安装执行:./install-node-exporter.sh

5、验证 浏览器输入 http://服务器IP:9100

grafana服务部署

采用docker部署
1、拉取镜像包

docker pull grafana/grafana:8.5.5

2、服务器上创建目录用于数据挂载

mkdir -p /opt/grafana

3、给创建的目录设置权限

chmod 777 -R /opt/grafana

4、使用命令启动容器

docker run -d --restart=always -it --name=grafana -p 31787:3000 -e GF_SECURITY_ALLOW_EMBEDDING=true -e GF_AUTH_PROXY_ENABLED=true -e GF_AUTH_ANONYMOUS_ENABLED=true -v /opt/grafana/:/var/lib/grafana grafana/grafana:8.5.5 
可嵌套:GF_SECURITY_ALLOW_EMBEDDING=true
可代理:GF_AUTH_PROXY_ENABLED=true
可免密:GF_AUTH_ANONYMOUS_ENABLED=true
数据挂载:/opt/grafana/:/var/lib/grafana

5、验证【可免密登录】浏览器输入 http://服务器IP:31787

配置修改

blackbox_exporter.yml配置文件修改

配置参考【本文使用默认配置】

https://github.com/prometheus/blackbox_exporter/blob/master/CONFIGURATION.md

alertmanager.yml配置文件修改

配置参考

https://prometheus.io/docs/alerting/latest/configuration/

global:
  resolve_timeout: 1m
  # smtp配置
  smtp_from: 'xxx@xxx.com' #发送邮箱地址
  smtp_smarthost: 'smtp.163.com:465' #邮箱客户端发送服务器配置【可在邮箱设置中查看】
  smtp_auth_username: 'xxx@xxx.com' #发送邮箱的登录账号
  smtp_auth_password: 'xxxxxx' #发送邮箱的登录密码
  smtp_require_tls: false #默认SMTP TLS要求,请注意,Go不支持到远程SMTP端点的未加密连接
route:
  group_by: ['hostname'] #分组标签,可在prometheus配置文件中自定义标签
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h
  receiver: 'email'
receivers:
- name: 'email'
  email_configs:
  - to: 'xxx@qq.com' #接收人的邮箱地址
    send_resolved: true
    headers:
      from: '警报中心'
      subject: '报警邮件'
      to: '运维'
inhibit_rules: #制定抑制匹配规则
  - source_match:
      level: '严重'
    target_match:
      level: '严重'
    equal: ['hostname', 'alertname', 'dev', 'instance']

prometheus.yml配置文件修改

配置参考

https://prometheus.io/docs/prometheus/latest/configuration/configuration/

# my global config
global:
  scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
           - 192.168.226.200:9093 #已部署的alertmanager服务地址

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"
  - "/opt/prometheus/rules/*.yml" #自定义的告警规则文件

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: "prometheus"

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ["192.168.226.200:9090"]  #prometheus自身服务
  - job_name: "alertmanager"
    static_configs:
      - targets: ["192.168.226.200:9093"] #alertmanager服务
  - job_name: "node_exporter_linux"
    static_configs:
      - targets: ["192.168.226.200:9100"] #linux node_exporter采集器
  - job_name: "node_exporter_windows"
    static_configs:
      - targets: ["192.168.24.187:9182"] #Windows node_exporter采集器
        labels: #自定义标签及值
          hostname: '宿主机服务器'
          ip: '192.168.24.187'
  - job_name: "http_icmp_1" #blackbox黑盒测试
    metrics_path: /probe
    params:
      module: [icmp]
    static_configs:
      - targets: ["192.168.24.198"]
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 192.168.226.200:9115 #blackbox地址

自定义的告警规则示例:

groups:
- name: default
  rules:
  - alert: Windows主机内存占用过高
    expr: 100 - ((windows_os_physical_memory_free_bytes / windows_cs_physical_memory_bytes) * 100) > 50
    for: 5m
    labels:
      level: '严重'
      metric: windows_os_physical_memory_free_bytes
    annotations:
      summary: "{{ $labels.hostname }}{{ $labels.ip }} 内存占用超过50%"
      description: "Memory usage is more than 50%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: Windows主机磁盘空间占用过高
    expr: 100.0 - 100 * windows_logical_disk_free_bytes{volume!~"^HarddiskVolum.."} / windows_logical_disk_size_bytes{volume!~"^HarddiskVolum.."} > 75
    for: 5m
    labels:
      level: '严重'
      metric: windows_logical_disk_free_bytes
    annotations:
      summary: "{{ $labels.hostname }}{{ $labels.ip }} 磁盘{{ $labels.volume }}占用超过75%"
      description: "Disk usage is more than 75%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

集成效果

Grafana可视化

grafana初始密码:admin/admin

posted @ 2022-10-28 17:48  #码农9527#  阅读(97)  评论(0)    收藏  举报