Python实战：为Prometheus开发自定义Exporter

在当今的微服务架构和容器化部署环境中，监控系统的重要性不言而喻。Prometheus作为一款开源的系统监控和警报工具，以其强大的功能和灵活性受到了广泛的欢迎。然而，Prometheus本身并不直接监控所有类型的服务或应用，这就需要我们为其开发自定义的Exporter。本文将带你走进实战，了解如何使用Python为Prometheus开发一个自定义的Exporter。

1. Prometheus Exporter基础

在Prometheus的架构中，Exporter负责从目标系统中抓取监控数据，并通过HTTP接口以特定的格式（通常是文本格式）暴露给Prometheus。Prometheus定期从这些Exporter的HTTP端点抓取数据，并进行存储、分析和警报。

2. 准备工作

在开始编写代码之前，你需要确保你的环境中已经安装了Python和必要的库。我们将使用prometheus_client库来生成Prometheus可以理解的监控数据。

你可以通过pip安装：

pip install prometheus_client

3. prometheus_client常见指标

Gauge（仪表盘）：表示一个可以任意上下波动的度量，例如内存用量或队列中的项目数。

from prometheus_client import start_http_server, Gauge

gauge = Gauge('example_gauge', 'An example gauge')

gauge.set(123.45)  # 设置一个固定值

Counter（计数器）：表示一个单向递增的计数器，通常用来统计请求的数量或处理的字节数。

from prometheus_client import Counter

counter = Counter('example_counter', 'An example counter')

counter.inc()  # 递增计数器
counter.inc(10)  # 递增计数器，数值为10

Histogram（直方图）：用于统计观察值的分布情况，例如请求的响应时间。

from prometheus_client import Histogram

histogram = Histogram('example_histogram', 'An example histogram')

with histogram.time():  # 用于记录代码块的执行时间
    pass  # 模拟一些操作

Summary（摘要）：用于记录观察值的摘要信息，例如请求的响应大小。

from prometheus_client import Summary

summary = Summary('example_summary', 'An example summary')

summary.observe(123.4)  # 观察一个值

Info（信息）：记录关于目标的信息，通常不用于监控，但可以用来记录软件版本等静态信息。

from prometheus_client import Info

info = Info('example_info', 'An example info')

info.info({'version': '1.2.3'})  # 记录信息

GaugeHistogram（仪表盘直方图）：用于记录一段时间内观察值的直方图。

from prometheus_client import GaugeHistogram

gauge_histogram = GaugeHistogram('example_gauge_histogram', 'An example gauge histogram')

with gauge_histogram.time():  # 记录代码块的执行时间
    pass  # 模拟一些操作

Enum（枚举）：用于记录有限数量的标签值。

from prometheus_client import Enum

enum = Enum('example_enum', 'An example enum', ['value1', 'value2'])

enum.labels('value1')  # 选择一个枚举值

为了使 Prometheus 抓取这些指标，你需要提供一个 HTTP 服务端点。prometheus_client 库已经包含了一个简单的 HTTP 服务器，可以自动为你的指标提供 /metrics 端点。

from prometheus_client import start_http_server

if __name__ == '__main__':
    start_http_server(8000)
    # 你的指标收集和更新逻辑

4.示例

from prometheus_client import start_http_server, Gauge, Counter, Summary, Histogram, Info
import time
import random



a = Gauge('a', 'Description of gauge')
b = Gauge('b', 'Description of gauge',['label1', 'label2'])
c = Counter('c', 'Description of counter')
s = Summary('request_latency_seconds', 'Description of summary')
h = Histogram('request_latency_seconds2', 'Description of histogram')
i = Info('my_build_version', 'Description of info')


def process_request():
    """A dummy function that takes some time."""
    a.set(random.randint(1,100))
    b.labels(label1='k1', label2='f1').set(random.randint(100, 200))
    b.labels(label1='k1', label2='f2').set(random.randint(200, 300))
    b.labels(label1='k2', label2='f2').set(random.randint(400, 500))
    c.inc()
    s.observe(4.7)
    h.observe(8.8)
    i.info({'version': '1.2.3', 'buildhost': 'foo@bar'})
    time.sleep(1)



if __name__ == '__main__':
    # 启动 HTTP 服务器，默认监听在 8000 端口
    start_http_server(8000)

    # 循环处理请求
    while True:
        process_request()

执行后，访问8000端口：

# HELP python_gc_objects_collected_total Objects collected during gc
# TYPE python_gc_objects_collected_total counter
python_gc_objects_collected_total{generation="0"} 83.0
python_gc_objects_collected_total{generation="1"} 305.0
python_gc_objects_collected_total{generation="2"} 0.0
# HELP python_gc_objects_uncollectable_total Uncollectable objects found during GC
# TYPE python_gc_objects_uncollectable_total counter
python_gc_objects_uncollectable_total{generation="0"} 0.0
python_gc_objects_uncollectable_total{generation="1"} 0.0
python_gc_objects_uncollectable_total{generation="2"} 0.0
# HELP python_gc_collections_total Number of times this generation was collected
# TYPE python_gc_collections_total counter
python_gc_collections_total{generation="0"} 41.0
python_gc_collections_total{generation="1"} 3.0
python_gc_collections_total{generation="2"} 0.0
# HELP python_info Python platform information
# TYPE python_info gauge
python_info{implementation="CPython",major="3",minor="10",patchlevel="11",version="3.10.11"} 1.0
# HELP a Description of gauge
# TYPE a gauge
a 61.0
# HELP b Description of gauge
# TYPE b gauge
b{label1="k1",label2="f1"} 113.0
b{label1="k1",label2="f2"} 233.0
b{label1="k2",label2="f2"} 401.0
# HELP c_total Description of counter
# TYPE c_total counter
c_total 4.0
# HELP c_created Description of counter
# TYPE c_created gauge
c_created 1.7274026703582432e+09
# HELP request_latency_seconds Description of summary
# TYPE request_latency_seconds summary
request_latency_seconds_count 4.0
request_latency_seconds_sum 18.8
# HELP request_latency_seconds_created Description of summary
# TYPE request_latency_seconds_created gauge
request_latency_seconds_created 1.7274026703582432e+09
# HELP request_latency_seconds2 Description of histogram
# TYPE request_latency_seconds2 histogram
request_latency_seconds2_bucket{le="0.005"} 0.0
request_latency_seconds2_bucket{le="0.01"} 0.0
request_latency_seconds2_bucket{le="0.025"} 0.0
request_latency_seconds2_bucket{le="0.05"} 0.0
request_latency_seconds2_bucket{le="0.075"} 0.0
request_latency_seconds2_bucket{le="0.1"} 0.0
request_latency_seconds2_bucket{le="0.25"} 0.0
request_latency_seconds2_bucket{le="0.5"} 0.0
request_latency_seconds2_bucket{le="0.75"} 0.0
request_latency_seconds2_bucket{le="1.0"} 0.0
request_latency_seconds2_bucket{le="2.5"} 0.0
request_latency_seconds2_bucket{le="5.0"} 0.0
request_latency_seconds2_bucket{le="7.5"} 0.0
request_latency_seconds2_bucket{le="10.0"} 4.0
request_latency_seconds2_bucket{le="+Inf"} 4.0
request_latency_seconds2_count 4.0
request_latency_seconds2_sum 35.2
# HELP request_latency_seconds2_created Description of histogram
# TYPE request_latency_seconds2_created gauge
request_latency_seconds2_created 1.7274026703582432e+09
# HELP my_build_version_info Description of info
# TYPE my_build_version_info gauge
my_build_version_info{buildhost="foo@bar",version="1.2.3"} 1.0

参考资料

https://www.5axxw.com/wiki/content/p095bn
https://blog.csdn.net/nujnus9221/article/details/139009361

posted @ 2024-09-23 17:47 测试小罡阅读(940) 评论(0) 收藏举报

刷新页面返回顶部

小罡测试笔记

Python实战：为Prometheus开发自定义Exporter

Python实战：为Prometheus开发自定义Exporter

1. Prometheus Exporter基础

2. 准备工作

3. prometheus_client常见指标

4.示例

参考资料

公告