Prometheus黑盒监控

部署blackbox黑盒监控
	1.blackbox_exporter概述
blackbox exporter支持基于HTTP, HTTPS, DNS, TCP, ICMP, gRPC协议来对目标节点进行监控。

比如基于http协议我们可以探测一个网站的返回状态码为200判读服务是否正常。

比如基于TCP协议我们可以探测一个主机端口是否监听。

比如基于ICMP协议来ping一个主机的连通性。

比如基于gRPC协议来调用接口并验证服务是否正常工作。

比如基于DNS协议可以来检测域名解析。
	
	2.下载blockbox
wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.25.0/blackbox_exporter-0.25.0.linux-amd64.tar.gz



	3.解压软件包 
[root@node-exporter42 ~]# tar xf blackbox_exporter-0.25.0.linux-amd64.tar.gz  -C /yanshier/softwares/


	4.编写启动脚本
[root@node-exporter42 ~]# cat > /lib/systemd/system/blackbox_exporter.service <<EOF
[Unit]
Description=blackbox service
Documentation=https://www.yanshier.com/
After=network.target

[Service]
ExecStart=/yanshier/softwares/blackbox_exporter-0.25.0.linux-amd64/blackbox_exporter --config.file="/yanshier/softwares/blackbox_exporter-0.25.0.linux-amd64/blackbox.yml" --web.listen-address=:9115

[Install]
WantedBy=multi-user.target
EOF


	
	5.启动blackbox
[root@node-exporter42 ~]# systemctl daemon-reload
[root@node-exporter42 ~]# 
[root@node-exporter42 ~]# systemctl enable --now blackbox_exporter
Created symlink /etc/systemd/system/multi-user.target.wants/blackbox_exporter.service → /lib/systemd/system/blackbox_exporter.service.
[root@node-exporter42 ~]# 
[root@node-exporter42 ~]# ss -ntl | grep 9115
LISTEN 0      4096               *:9115            *:*          
[root@node-exporter42 ~]# 

	
	6.访问blackbox的WebUI
http://10.0.0.42:9115/


	7.查看blackbox内置的模块列表
http://10.0.0.42:9115/config	
	
	8.手动实现http_2xx探测百度网站:
http://10.0.0.42:9115/probe?target=baidu.com&module=http_2xx&debug=true




- Prometheus集成blackbox黑盒http_2xx实现网站监控
	1.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: "yanshier-blackbox"
    # 指定探针路径
    metrics_path: probe 
    # 传递模块参数,若不不指定,则默认就是http_2xx模块。
    params:
      module: [http_2xx]
    static_configs:
        # 配置需要监控的目标
      - targets: 
          - www.jd.com
          - www.yanshier.com
          - 10.0.0.51:3000
    # 表示监控目标并不直接监控,而是有blackbox进行监控
    relabel_configs:
        # 添加一个target参数
      - source_labels: [__address__]
        target_label: __param_target
        # 修改Endpoint地址,而此时Endpoint地址和instance的__address__是一致的。
      - target_label: __address__
        replacement: 10.0.0.42:9115
        # 由于修改了__address__,instance也会跟着变化,因此需要将target再重新赋值。
      - source_labels: [__param_target]
        target_label: instance

  - job_name: 'yanshier_blackbox_exporter'
    static_configs:
      - targets: ['10.0.0.42:9115']


	2.热加载配置文件
[root@prometheus-server31 ~]# curl  -X POST 10.0.0.31:9090/-/reload 


	3.验证服务是否生效
http://10.0.0.31:9090/targets


	4.blackbox_exporter的http_2xx探针指标说明
probe_http_ssl:
	当probe_http_ssl的值为1时,表示该instance使用的是https协议。为0表示使用的http协议。
	
probe_http_status_code
	表示网站返回的状态码,如果为0表示探测失败!

probe_http_duration_seconds:
	表示分阶段耗时统计。
	
probe_duration_seconds:
	表示总耗时。
	
probe_success:
	表示探测是否成功,其中1表示探测成功,0表示探测失败。


	5.grafana导入模板 
7587

13659


prometheus基于blackbox的ICMP监控目标主机是否存活
	1 修改Prometheus配置文件
[root@prometheus-server31 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: 'yanshier-blackbox-exporter-icmp'
    metrics_path: /probe
    params:
      # 如果不指定模块,则默认类型为"http_2xx",不能乱写!乱写监控不到服务啦!
      module: [icmp]
    static_configs:
      - targets:
          - 10.0.0.41
          - 10.0.0.42
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 10.0.0.42:9115  


	2.重新加载配置
[root@prometheus-server31 ~]# curl  -X POST 10.0.0.31:9090/-/reload 

	3 访问prometheus的WebUI




	5 访问blackbox的WebUI 
http://10.0.0.42:9115/


	6 grafana过滤jobs数据
基于"yanshier-blackbox-exporter-icmp"标签进行过滤。

	
-  prometheus基于blackbox的TCP案例监控端口是否存活
	1 修改Prometheus配置文件
[root@prometheus-server31 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  ...
  - job_name: 'yanshier-blackox-exporter-tcp'
    metrics_path: /probe
    params:
      module: [tcp_connect]
    static_configs:
      - targets:
          - 10.0.0.41:80
          - 10.0.0.42:22
          - 10.0.0.31:9090                  

    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 10.0.0.42:9115
		

	2 重新加载配置文件
[root@prometheus-server31 ~]# curl -X POST http://10.0.0.31:9090/-/reload
[root@prometheus-server31 ~]# 

	3.访问prometheus的WebUI
http://10.0.0.31:9090/targets

	4.访问blackbox exporter的WebUI
http://10.0.0.41:9115/

	5.使用grafana查看数据
基于"yanshier-blackbox-exporter-tcp"标签进行过滤。


prometheus基于blackbox的ssh案例监控ssh服务是否存活
	1 修改Prometheus配置文件
[root@prometheus-server31 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: 'yanshier-blackox-exporter-ssh'
    metrics_path: /probe
    params:
      module: [ssh_banner]
    static_configs:
      - targets:
          - 10.0.0.41:22
          - 10.0.0.43:22
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: 10.0.0.42:9115    
		

	2 重新加载配置文件
[root@prometheus-server31 ~]# curl -X POST http://10.0.0.31:9090/-/reload
[root@prometheus-server31 ~]# 

	3.访问prometheus的WebUI
http://10.0.0.31:9090/targets

	4.访问blackbox exporter的WebUI
http://10.0.0.41:9115/

	5.使用grafana查看数据
基于"yanshier-blackox-exporter-ssh"标签进行过滤。



- 部署pushgateway组件
	1.pushgateway的作用
就是用来用户自定义监控指标,一般用于临时存储。

	2.下载pushgateway
wget https://github.com/prometheus/pushgateway/releases/download/v1.10.0/pushgateway-1.10.0.linux-amd64.tar.gz



	3.解压软件包 
[root@node-exporter42 ~]# tar xf pushgateway-1.10.0.linux-amd64.tar.gz -C /usr/local/bin/ pushgateway-1.10.0.linux-amd64/pushgateway --strip-components=1

	4.创建数据目录
[root@node-exporter42 ~]# mkdir -pv /yanshier/data/pushgateway


	5.编写启动脚本 
cat > /lib/systemd/system/pushgateway.service <<EOF
[Unit]
Description=pushgateway services
Documentation=https://www.yanshier.com
After=network.target

[Service]
ExecStart=/usr/local/bin/pushgateway --web.telemetry-path="/metrics" --web.listen-address=:9091 --web.enable-lifecycle --persistence.file=/yanshier/data/pushgateway/pushgateway.data --persistence.interval=1m


[Install]
WantedBy=multi-user.target
EOF

	6.启动服务
[root@node-exporter42 ~]# systemctl daemon-reload
[root@node-exporter42 ~]# 
[root@node-exporter42 ~]# systemctl enable --now pushgateway.service 
Created symlink /etc/systemd/system/multi-user.target.wants/pushgateway.service → /lib/systemd/system/pushgateway.service.
[root@node-exporter42 ~]# 
[root@node-exporter42 ~]# ss -ntl | grep 9091
LISTEN 0      4096               *:9091            *:*          
[root@node-exporter42 ~]# 


	7.查看pushgateway的WebUI
http://10.0.0.42:9091/
	

- Prometheus集成pushgateway组件实战案例
	1.修改Prometheus的配置文件
[root@prometheus-server31 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: 'yanshier-pushgateway'
    static_configs:
      - targets: ['10.0.0.42:9091']

	2.热加载配置文件
[root@prometheus-server31 ~]# curl  -X POST 10.0.0.31:9090/-/reload  


	3.验证配置是否生效
http://10.0.0.31:9090/targets


	4.发生测试数据到pushgateway
		4.1 发送单条数据
[root@node-exporter41 ~]# echo "students_online_count 78" | curl --data-binary @- http://10.0.0.42:9091/metrics/job/yanshier-student-online


		4.2 发送多条数据
cat <<EOF | curl --data-binary @- http://10.0.0.42:9091/metrics/job/yanshier_hobby/instance/10.0.0.99
# TYPE xijiao_count counter
xijiao_count{name="wanghaonan",age="22"} 365
xijiao_count{name="songlpngyang",age="22"} 366
xijiao_count{name="wanghuifeng",age="21"} 98
xijiao_count{name="yuanshuhao",age="23"} 86
xijiao_count{name="libowen",age="23"} 66
xijiao_count{name="luozhiyang",age="24"} 32
# TYPE game_seconds gauge
# HELP game_seconds play game times.
game_seconds{name="yanbo"} 3600
EOF


	5.grafana展示数据
自定义Dashboard即可。略,见视频。


prometheus监控tcp的12种状态案例
[root@node-exporter41 ~]# cat > /usr/local/bin/tcp_status.sh  <<'EOF'
#!/bin/bash


# 定义TCP的12种状态
ESTABLISHED_COUNT=0
SYN_SENT_COUNT=0
SYN_RECV_COUNT=0
FIN_WAIT1_COUNT=0
FIN_WAIT2_COUNT=0
TIME_WAIT_COUNT=0
CLOSE_COUNT=0
CLOSE_WAIT_COUNT=0
LAST_ACK_COUNT=0
LISTEN_COUNT=0
CLOSING_COUNT=0
UNKNOWN_COUNT=0

# 定义任务名称
JOB_NAME=tcp_status
# 定义实例名称
INSTANCE_NAME=harbor250
# 定义pushgateway主机
HOST=10.0.0.42
# 定义pushgateway端口
PORT=9091

# TCP的12种状态
ALL_STATUS=(ESTABLISHED SYN_SENT SYN_RECV FIN_WAIT1 FIN_WAIT2 TIME_WAIT CLOSE CLOSE_WAIT LAST_ACK LISTEN CLOSING UNKNOWN)

# 声明一个关联数组,类似于py的dict,go的map
declare -A tcp_status

# 统计TCP的12种状态
for i in ${ALL_STATUS[@]}
do
  temp=`netstat -untalp |  grep $i  | wc -l`
  tcp_status[${i}]=$temp
done

# 将统计后的结果发送到pushgateway
for i in ${!tcp_status[@]}
do 
   data="$i ${tcp_status[$i]}"
   # TODO: shell如果想要设计成相同key不同标签的方式存在问题,只会有最后一种状态被发送
   # 目前我怀疑是pushgateway组件不支持同一个metrics中key所对应的value不同的情况。
   #data="yanshier_tcp_all_status{status=\"$i\"} ${tcp_status[$i]}"
   #echo $data
   echo $data | curl --data-binary @-  http://${HOST}:${PORT}/metrics/job/${JOB_NAME}/instance/${INSTANCE_NAME}
   # sleep 1
done
EOF

	2.编写定时任务推送数据到pushgateway
[root@node-exporter41 ~]# echo "*/5 * * * * /usr/local/bin/tcp_status.sh" >> /var/spool/cron/crontabs/root
[root@node-exporter41 ~]# 
[root@node-exporter41 ~]# crontab -l
*/5 * * * * /usr/local/bin/tcp_status.sh
[root@node-exporter41 ~]# 
[root@node-exporter41 ~]# chmod +x /usr/local/bin/tcp_status.sh 
[root@node-exporter41 ~]# 
[root@node-exporter41 ~]# /usr/local/bin/tcp_status.sh
[root@node-exporter41 ~]# 

	3.观察pushgateway的WebUI
http://10.0.0.42:9091/




4.参考
4.1  
[root@prometheus-server31 ~]# sed -i 's/\xc2\xa0/ /g' /usr/local/bin/tcp_status2.sh
[root@prometheus-server31 ~]# 
[root@prometheus-server31 ~]# cat -A /usr/local/bin/tcp_status2.sh
#!/bin/bash
pushgateway_url="http://10.0.0.42:9091/metrics/job/tcp_status"
time=$(date +%Y-%m-%d+%H:%M:%S)

state="SYN-SENT SYN-RECV FIN-WAIT-1 FIN-WAIT-2 TIME-WAIT CLOSE CLOSE-WAIT LAST-ACK LISTEN CLOSING ESTAB"
for i in  $state
 do
 t=`ss -tan |grep $i |wc -l`
 echo tcp_connections{state=\""$i"\"} $t >>/tmp/tcp.txt
done;

cat /tmp/tcp.txt | curl --data-binary @- $pushgateway_url
rm -rf  /tmp/tcp.txt
[root@prometheus-server31 ~]# 


使用python程序自定义exporter案例
	1 安装pip3工具包
[root@prometheus-node42 ~]#  apt update
[root@prometheus-node42 ~]#  apt install -y python3-pip


	1.2 pip配置加速
[root@node-exporter41 ~]# mkdir ~/.pip
[root@node-exporter41 ~]# 
[root@node-exporter41 ~]# vim ~/.pip/pip.conf
[root@node-exporter41 ~]# 
[root@node-exporter41 ~]# cat ~/.pip/pip.conf
# [global]
# index-url=https://pypi.tuna.tsinghua.edu.cn/simple
# [install]
# trusted-host=pypi.douban.com
[global]
index-url=https://mirrors.aliyun.com/pypi/simple
[install]
trusted-host=mirrors.aliyun.com
[root@node-exporter41 ~]# 


	1.3 安装实际环境中相关模块库
[root@node-exporter41 ~]# pip3 install flask prometheus_client
[root@node-exporter41 ~]# pip3 list


 
	1.4 编写代码
[root@node-exporter41 ~]# cat flask_metric.py 

from prometheus_client import start_http_server,Counter, Summary
from flask import Flask, jsonify
from wsgiref.simple_server import make_server
import time

app = Flask(__name__)

# Create a metric to track time spent and requests made
REQUEST_TIME = Summary('request_processing_seconds', 'Time spent processing request')
COUNTER_TIME  = Counter("request_count", "Total request count of the host")

@app.route("/apps")
@REQUEST_TIME.time()
def requests_count():
    COUNTER_TIME.inc()
    return jsonify({"office": "https://www.yanshier.com"},{"auther":"Jason Yin"})

if __name__ == "__main__":
    start_http_server(8000)
    httpd = make_server( '0.0.0.0', 8001, app )
    httpd.serve_forever()
[root@node-exporter41 ~]# 


	1.5 启动python程序
[root@node-exporter41 ~]# python3 flask_metric.py 


...
# 当启动客户端测试时,可能会出现如下的信息
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64
10.0.0.43 - - [13/Nov/2024 17:31:57] "GET /apps HTTP/1.1" 200 64


	
	1.6 客户端测试
[root@node-exporter43 ~]# cat yanshier_curl_metrics.sh
#!/bin/bash

URL=http://10.0.0.41:8001/apps

while true;do
    curl_num=$(( $RANDOM%50+1 ))
    sleep_num=$(( $RANDOM%5+1 ))
    for c_num in `seq $curl_num`;do
        curl -s $URL &> /dev/null
    done
    sleep $sleep_num
done
[root@node-exporter43 ~]# 
[root@node-exporter43 ~]# bash yanshier_curl_metrics.sh



	1.7 prometheus监控python自定义的exporter实战
[root@prometheus-server31 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: "yinzhengjie_python_custom_metrics"
    static_configs:
      - targets:
        - 10.0.0.41:8000



	1.8 重新加载配置文件
curl -X POST http://10.0.0.31:9090/-/reload 


	1.9 验证prometheus是否采集到数据
http://10.0.0.31:9090/targets


	1.10 grafana作图展示
request_count_total
	pps请求总数。
	
	
increase(request_count_total{job="yinzhengjie_python_custom_metrics"}[1m])
	每分钟请求数量曲线QPS。


irate(request_count_total{job="yinzhengjie_python_custom_metrics"}[1m])
	每分钟请求量变化率曲线QPS 
	
request_processing_seconds_sum{job="yinzhengjie_python_custom_metrics"} / request_processing_seconds_count{job="yinzhengjie_python_custom_metrics"}
	请求处理平均耗时


Prometheus联邦模式
	1.修改Prometheus32节点 
[root@prometheus-server32 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: 'yanshier-prometheus32'
    static_configs:
      - targets: ["10.0.0.41:9100"]

[root@prometheus-server32 ~]# curl  -X POST 10.0.0.32:9090/-/reload
[root@prometheus-server32 ~]# 

	2.修改Prometheus33节点 
[root@prometheus-server33 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: 'yanshier-prometheus33'
    static_configs:
      - targets: ["10.0.0.42:9100","10.0.0.43:9100"]


[root@prometheus-server33 ~]# curl  -X POST 10.0.0.33:9090/-/reload
[root@prometheus-server33 ~]# 



	3.验证各节点的配置是否生效 
http://10.0.0.32:9090/targets
http://10.0.0.33:9090/targets



温馨提示:
	并在2个Prometheus server服务端使用PromQL查询: node_cpu_guest_seconds_total
	
	
	4.配置Prometheus 31的联邦模式
[root@prometheus-server31 ~]# vim /yanshier/softwares/prometheus-2.53.3.linux-amd64/prometheus.yml 
...
  - job_name: "prometheus-federate-32"
    metrics_path: "/federate"
    # 用于解决标签的冲突问题,有效值为: true和false,默认值为false
    # 当设置为true时,将保留抓取的标签以忽略服务器自身的标签。说白了会覆盖原有标签。
    # 当设置为false时,则不会覆盖原有标签,而是在标点前加了一个"exported_"前缀。
    honor_labels: true
    params:
       "match[]":
       - '{job="promethues"}'
       - '{__name__=~"job:.*"}'
       - '{__name__=~"node.*"}'
    static_configs:
    - targets:
        - "10.0.0.32:9090"

  - job_name: "prometheus-federate-33"
    metrics_path: "/federate"
    honor_labels: true
    params:
       "match[]":
       - '{job="promethues"}'
       - '{__name__=~"job:.*"}'
       - '{__name__=~"node.*"}'
    static_configs:
    - targets:
        - "10.0.0.33:9090"


	
	5.热加载配置文件
[root@prometheus-server31 ~]# curl  -X POST 10.0.0.31:9090/-/reload  

	6.验证配置是否生效 
http://10.0.0.31:9090/targets

查询PromQL指标: 
	node_cpu_guest_seconds_total{job=~"yanshier.*"}
	
	7.grafana导入数据
1860

posted @ 2025-01-26 11:25  颜十二  阅读(209)  评论(0)    收藏  举报