Prometheus 源码专题【左扬精讲】—— Prometheus Exporter 定制化开发：采集篇 —— 深入学习采集器工作机制

一、Prometheus Exporter 采集器概述

在监控系统中，Prometheus 作为一款强大的开源监控和警报工具，被广泛应用于各种场景。然而，不同的系统、服务或应用程序产生的指标格式和类型千差万别，Prometheus 本身无法直接获取和处理这些多样化的指标数据。这时，Prometheus Exporter 采集器就应运而生。

1.1、什么是采集器

采集器（Collector）是 Prometheus Exporter 的核心组件，它就像一个数据桥梁，负责从各种数据源收集指标数据，并将其转换为 Prometheus 能够理解和处理的格式。采集器可以是自定义开发的，也可以是针对特定系统或服务预定义的，例如用于监控 Go 语言运行时的 goCollector，用于监控系统进程的 processCollector 等。

1.2、解决的问题

Prometheus Exporter 采集器解决了 Prometheus 在数据采集方面的通用性问题。不同的系统和服务可能使用不同的协议、接口和数据格式来暴露指标，Prometheus 无法直接与之交互。采集器通过实现特定的采集逻辑，将这些多样化的指标数据统一转换为 Prometheus 支持的格式，使得 Prometheus 能够方便地采集和监控各种系统和服务的指标。

1.3、本质和核心功能

采集器的本质 是一个实现了 Collector 接口的结构体，它主要包含两个核心方法：Describe 和 Collect。Describe 方法用于向 Prometheus 注册表描述该采集器所包含的指标信息，包括指标名称、帮助信息、标签等；Collect 方法用于实际采集指标数据，并将其发送给 Prometheus。通过这两个方法，采集器实现了指标的注册和采集功能。

二、Prometheus Exporter 采集器工作机制详解

2.1、注册指标

注册指标 是 Prometheus Exporter 工作的起点，它为后续的指标采集和暴露奠定基础。在应用程序启动阶段，我们需要明确要监控的指标，并将其注册到 Prometheus 的注册表中。

- 默认注册表：prometheus.DefaultRegisterer 和 prometheus.DefaultGatherer 是全局的注册表和采集器，适合大多数场景。
- 自定义注册表：可以通过 prometheus.NewRegistry() 创建独立的注册表，适用于多实例、测试或隔离场景。
- 注册表的 Gather 方法：注册表本身实现了 Gatherer 接口，负责遍历所有注册的 Collector 并调用其 Collect 方法。

从代码层面看，在 client_golang 里，我们可以创建不同类型的指标，例如 Counter、Gauge、Histogram 等。以 Counter 为例，Counter 是一种累加的指标，常用于记录事件发生的次数，如 HTTP 请求的总数、任务执行的次数等。

package main

import (
    "net/http"

    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
    // 创建一个 Counter 指标
    requestsTotal := prometheus.NewCounter(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
    )

    // 将指标注册到默认注册表
    prometheus.MustRegister(requestsTotal)

    // 处理 HTTP 请求时增加计数器
    http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
        requestsTotal.Inc()
        w.WriteHeader(http.StatusOK)
    })

    // 暴露指标端点
    http.Handle("/metrics", promhttp.Handler())

    // 启动 HTTP 服务器
    http.ListenAndServe(":8080", nil)
}

在上述代码中，prometheus.NewCounter 用于创建一个 Counter 指标，需要传入 prometheus.CounterOpts 结构体。在该结构体中，Name 是指标的名称，在 Prometheus 中必须唯一；Help 是对该指标的描述，方便理解指标的含义。例如，可以通过如下方式创建一个 Counter 指标：

requestsTotal := prometheus.NewCounter(prometheus.CounterOpts{
    Name: "http_requests_total",
    Help: "Total number of HTTP requests",
})

创建完指标后，需要将其注册到 Prometheus 的注册表中，这样 Prometheus 才知道有哪些指标需要采集。prometheus.MustRegister 函数可以将创建的指标注册到默认注册表中。如果注册失败，prometheus.MustRegister 会触发 panic，以此确保程序在遇到注册问题时能够明确反馈错误。以刚才创建的 requestsTotal 指标为例，可使用如下代码进行注册：

prometheus.MustRegister(requestsTotal)

通过这样的注册操作，Prometheus 就能正常采集 requestsTotal 这个指标的数据了。

2.2、采集指标

当 Prometheus 服务器按照配置的时间间隔向 Exporter 发送采集请求时，Exporter 会执行采集操作。在 client_golang 中，采集操作通常涉及调用指标的 Collect 方法。

- Collect 方法的并发：Prometheus 默认会并发调用所有 Collector 的 Collect 方法，因此 Collect 方法内部要保证线程安全（如加锁、原子操作）。
- 避免阻塞：Collect 方法应尽量避免长时间阻塞（如慢 SQL、网络请求），否则会影响整个 scrape 的时延。
- 批量采集与缓存：对于高开销的数据源，建议在 Collect 之外定时批量采集并缓存，Collect 只读缓存。

Collect 方法是 Collector 接口的一部分，所有的指标类型和指标向量都实现了 Collector 接口。当 Prometheus 发起采集请求时，它会调用注册表中所有已注册 Collector 的 Collect 方法。对于 Counter 指标，Collect 方法会将当前的计数值发送给 Prometheus。
例如，在自定义的 Collector 中，我们可以重写 Collect 方法来实现特定的指标采集逻辑：

type MyCollector struct {
    myCounter prometheus.Counter
}

func (mc MyCollector) Collect(ch chan<- prometheus.Metric) {
    mc.myCounter.Collect(ch)
}

func (mc MyCollector) Describe(ch chan<- *prometheus.Desc) {
    mc.myCounter.Describe(ch)
}

在上述代码中，MyCollector 结构体包含一个 Counter 指标，Collect 方法将该 Counter 指标的采集结果发送到 ch 通道中。

2.2.1、为什么要重写 Collect 方法

1. Collector 是一切指标的基础：无论是 Counter、Gauge、Histogram、Summary 还是自定义采集器，最终都实现了 Collector 接口。
2. Collector 可以嵌套：一个 Collector 可以持有其他 Collector，实现更复杂的指标聚合和分组。例如，Go runtime 的 goCollector 就是多个子 Collector 的组合。

2.2.1.1、实现自定义指标采集逻辑

标准的指标类型（如 Counter、Gauge 等）能满足很多常见的监控需求，但在实际应用场景中，我们常常需要监控一些特定的业务指标。这些指标的采集逻辑可能与标准指标不同，这时就需要重写 Collect 方法来实现自定义的采集逻辑。
例如，在一个电商系统中，我们可能需要监控每个商品的销售数量，每个商品的销售数据需要从数据库中查询得到。这时就需要自定义一个 Collector 并重写 Collect 方法来实现从数据库中查询商品销售数据的逻辑:

package main

import (
    "database/sql"
    "fmt"
    "log"

    _ "github.com/go-sql-driver/mysql"
    "github.com/prometheus/client_golang/prometheus"
)

type ProductSalesCollector struct {
    db          *sql.DB
    salesMetric *prometheus.Desc
}

func NewProductSalesCollector(db *sql.DB) *ProductSalesCollector {
    return &ProductSalesCollector{
        db: db,
        salesMetric: prometheus.NewDesc(
            "product_sales_count",
            "Number of sales for each product",
            []string{"product_id"},
            nil,
        ),
    }
}

func (psc *ProductSalesCollector) Describe(ch chan<- *prometheus.Desc) {
    ch <- psc.salesMetric
}

func (psc *ProductSalesCollector) Collect(ch chan<- prometheus.Metric) {
    rows, err := psc.db.Query("SELECT product_id, sales_count FROM products")
    if err != nil {
        log.Printf("Error querying product sales: %v", err)
        return
    }
    defer rows.Close()

    for rows.Next() {
        var productID string
        var salesCount int
        if err := rows.Scan(&productID, &salesCount); err != nil {
            log.Printf("Error scanning product sales row: %v", err)
            continue
        }
        ch <- prometheus.MustNewConstMetric(
            psc.salesMetric,
            prometheus.CounterValue,
            float64(salesCount),
            productID,
        )
    }
    if err := rows.Err(); err != nil {
        log.Printf("Error iterating over product sales rows: %v", err)
    }
}

func main() {
    db, err := sql.Open("mysql", "user:password@tcp(127.0.0.1:3306)/dbname")
    if err != nil {
        log.Fatal(err)
    }
    defer db.Close()

    collector := NewProductSalesCollector(db)
    prometheus.MustRegister(collector)

    // 这里可以添加 HTTP 服务来暴露指标
    fmt.Println("Server started at :8080")
    // 以下代码可根据实际情况添加
    // http.Handle("/metrics", promhttp.Handler())
    // log.Fatal(http.ListenAndServe(":8080", nil))
}

在这个例子中，ProductSalesCollector 重写了 Collect 方法，从数据库中查询每个商品的销售数量，并将其作为指标发送给 Prometheus。

2.2.1.2、实现自定义指标采集逻辑

有时候，我们需要将多个指标进行聚合后再发送给 Prometheus。重写 Collect 方法可以方便地实现指标的聚合逻辑。
例如，我们有多个服务实例，每个实例都有自己的请求处理时间指标，我们希望将这些指标聚合为一个全局的请求处理时间指标。这时可以自定义一个 Collector 并重写 Collect 方法来实现指标的聚合：

package main

import (
    "github.com/prometheus/client_golang/prometheus"
)

type AggregatedLatencyCollector struct {
    instanceLatencies []prometheus.Gauge
    aggregatedLatency *prometheus.GaugeVec
}

func NewAggregatedLatencyCollector(instanceLatencies []prometheus.Gauge) *AggregatedLatencyCollector {
    aggregatedLatency := prometheus.NewGaugeVec(
        prometheus.GaugeOpts{
            Name: "aggregated_request_latency",
            Help: "Aggregated request latency across all instances",
        },
        []string{"service"},
    )
    return &AggregatedLatencyCollector{
        instanceLatencies: instanceLatencies,
        aggregatedLatency: aggregatedLatency,
    }
}

func (alc *AggregatedLatencyCollector) Describe(ch chan<- *prometheus.Desc) {
    alc.aggregatedLatency.Describe(ch)
}

func (alc *AggregatedLatencyCollector) Collect(ch chan<- prometheus.Metric) {
    var totalLatency float64
    for _, latency := range alc.instanceLatencies {
        var metric prometheus.Metric
        latency.Collect(chan prometheus.Metric{&metric})
        var pb prometheus.Metric
        metric.Write(&pb)
        totalLatency += pb.GetGauge().GetValue()
    }
    alc.aggregatedLatency.WithLabelValues("my_service").Set(totalLatency)
    alc.aggregatedLatency.Collect(ch)
}

在这个例子中，AggregatedLatencyCollector 重写了 Collect 方法，将多个实例的请求处理时间指标聚合为一个全局的请求处理时间指标，并发送给 Prometheus。

2.2.1.3、重写 Collect 方法注意事项（指标一致性（Desc 匹配））

必须实现 Describe 方法：Describe 方法用于向 Prometheus 描述指标的元数据（如名称、帮助文本、标签）。Collect 方法返回的每个指标必须与 Describe 中声明的 Desc 严格匹配。
Desc 不可变：Desc 对象一旦创建，其名称、标签和帮助文本不能在运行时修改，否则会导致 Prometheus 采集失败。

示例（错误做法）：

func (c *MyCollector) Describe(ch chan<- *prometheus.Desc) {
    ch <- prometheus.NewDesc("my_metric", "Help text", nil, nil)
}

func (c *MyCollector) Collect(ch chan<- prometheus.Metric) {
    // ❌ 错误：使用了不同的 Desc 名称
    ch <- prometheus.MustNewConstMetric(
        prometheus.NewDesc("another_metric", "Wrong help", nil, nil),
        prometheus.GaugeValue,
        42,
    )
}

名称不相符：Describe 方法中定义的指标名称为 "my_metric"，然而 Collect 方法中创建的指标名称却是 "another_metric"。Prometheus 要求这两个名称必须完全一样。
帮助文本不一致：Describe 方法中的帮助文本是 "Help text"，但 Collect 方法里的帮助文本变成了 "Wrong help"，这也不符合要求。

2.2.1.4、重写 Collect 方法注意事项（错误处理）

避免 panic：Collect 方法中不应触发 panic，否则会导致整个 Exporter 崩溃。使用 recover() 捕获异常或返回空指标。
记录错误但继续执行：若采集过程中出现局部错误（如某个指标失败），应记录日志但继续返回其他有效指标。

示例（错误做法）：

func (c *MyCollector) Collect(ch chan<- prometheus.Metric) {
    defer func() {
        if r := recover(); r != nil {
            log.Printf("Recovered from panic in Collect: %v", r)
        }
    }()

    // 处理可能失败的操作
    data, err := fetchDataFromExternalSource()
    if err != nil {
        log.Printf("Failed to fetch data: %v", err)
        return // 或返回默认指标
    }
    // ...
}

2.2.1.5、重写 Collect 方法注意事项（性能优化）

批量操作：避免在 Collect 方法中执行多次独立的数据库查询或网络请求，尽量批量获取数据。
缓存机制：对于计算成本高的数据，可定期更新缓存，Collect 直接返回缓存结果。

示例（使用缓存）：

type MyCollector struct {
    cache      map[string]float64
    lastUpdate time.Time
    mu         sync.RWMutex
}

func (c *MyCollector) Collect(ch chan<- prometheus.Metric) {
    c.mu.RLock()
    // 使用缓存数据（避免重复计算）
    for k, v := range c.cache {
        ch <- prometheus.MustNewConstMetric(c.desc, ...)
    }
    c.mu.RUnlock()
}

// 定期刷新缓存的方法
func (c *MyCollector) refreshCache() { ... }

2.2.1.6、重写 Collect 方法注意事项（指标类型一致性）

- 指标类型不可变：同一个 Desc 在 Collect 中返回的指标类型（如 CounterValue、GaugeValue）必须始终一致，否则 Prometheus 会拒绝接收。
- 避免动态类型：不要根据运行时条件返回不同类型的指标（如有时返回 Counter，有时返回 Gauge）。

2.2.1.7、重写 Collect 方法注意事项（内存管理）

避免内存泄漏：Collect 方法中创建的对象应及时释放，避免累积大量临时对象导致内存溢出。
复用对象：对于频繁创建的对象（如 prometheus.Metric），考虑对象池（sync.Pool）复用。

2.2.1.8、重写 Collect 方法注意事项（监控自身健康状态）

添加内部指标：在自定义 Collector 中添加自身的健康指标（如采集耗时、错误计数），便于监控 Exporter 本身的状态。

示例（添加内部指标）：

type MyCollector struct {
    collectionTime prometheus.Gauge
    errorCounter   prometheus.Counter
}

func (c *MyCollector) Collect(ch chan<- prometheus.Metric) {
    start := time.Now()
    defer c.collectionTime.Set(time.Since(start).Seconds())

    // 采集逻辑...
    if err != nil {
        c.errorCounter.Inc()
    }
}

2.3、暴露指标

Exporter 将采集到的指标以 Prometheus 能够理解的格式（通常是文本格式）暴露在一个 HTTP 端点上。Prometheus 服务器会定期从这个端点拉取指标数据。

promhttp.Handler() 是 Prometheus Go 客户端库暴露指标的核心 handler。promhttp.Handler() 返回的就是一个实现了 http.Handler 的对象，内部通过 HandlerFor 处理所有指标收集和输出逻辑。

2.3.1、关键代码入口

在 Go 语言的 Prometheus 官方客户端（client_golang）中，暴露指标的核心代码通常是：

http.Handle("/metrics", promhttp.Handler())

这行代码的作用是：将 /metrics 路径与 promhttp.Handler() 返回的 handler 绑定到一起。
当有 HTTP 请求访问 /metrics 时，就会自动触发 promhttp.Handler() 的处理逻辑。
这是因为这行代码不是库(https://github.com/prometheus/client_golang/blob/v1.22.0/)本身的内容，而是用户在自己写 exporter/main.go 时写的。Prometheus 官方库只提供了 promhttp.Handler() 这个 handler，具体怎么绑定到 HTTP 路由，是由用户自己决定的。
```
// 举例：用户代码通常这样写：

package main

import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus/promhttp"
)

func main() {
    http.Handle("/metrics", promhttp.Handler())
    http.ListenAndServe(":8080", nil)
}　　
```

2.3.2、promhttp.Handler() 源码解读

https://github.com/prometheus/client_golang/blob/v1.22.0/prometheus/promhttp/http.go#L96

func Handler() http.Handler {
	return InstrumentMetricHandler(
		prometheus.DefaultRegisterer, HandlerFor(prometheus.DefaultGatherer, HandlerOpts{}),
	)
}

Handler() 返回一个实现了 http.Handler 的对象。
内部调用了 HandlerFor，传入了默认的指标收集器（prometheus.DefaultGatherer）和默认参数。

2.3.3、HandlerFor 的实现

https://github.com/prometheus/client_golang/blob/v1.22.0/prometheus/promhttp/http.go#L98

// Handler returns an http.Handler for the prometheus.DefaultGatherer, using
// default HandlerOpts, i.e. it reports the first error as an HTTP error, it has
// no error logging, and it applies compression if requested by the client.
// 该函数返回一个用于 prometheus.DefaultGatherer 的 http.Handler，使用默认的 HandlerOpts。
// 即：将第一个错误报告为 HTTP 错误，不进行错误日志记录，并且在客户端请求时应用压缩。
//
// The returned http.Handler is already instrumented using the
// InstrumentMetricHandler function and the prometheus.DefaultRegisterer. If you
// create multiple http.Handlers by separate calls of the Handler function, the
// metrics used for instrumentation will be shared between them, providing
// global scrape counts.
// 返回的 http.Handler 已经使用 InstrumentMetricHandler 函数和 prometheus.DefaultRegisterer 进行了仪器化。
// 如果你通过多次调用 Handler 函数创建多个 http.Handler，用于仪器化的指标将在它们之间共享，从而提供全局的抓取计数。
//
// This function is meant to cover the bulk of basic use cases. If you are doing
// anything that requires more customization (including using a non-default
// Gatherer, different instrumentation, and non-default HandlerOpts), use the
// HandlerFor function. See there for details.
// 此函数旨在覆盖大多数基本用例。如果你需要进行更多自定义操作（包括使用非默认的 Gatherer、不同的仪器化方式和非默认的 HandlerOpts），
// 请使用 HandlerFor 函数。详情请参考 HandlerFor 函数的注释。
func Handler() http.Handler {
    // 使用 InstrumentMetricHandler 对 HandlerFor 返回的 http.Handler 进行仪器化
    // 第一个参数是默认的注册器 prometheus.DefaultRegisterer
    // 第二个参数是使用默认的 HandlerOpts 和默认的 Gatherer（prometheus.DefaultGatherer）调用 HandlerFor 得到的 http.Handler
    return InstrumentMetricHandler(
        prometheus.DefaultRegisterer, HandlerFor(prometheus.DefaultGatherer, HandlerOpts{}),
    )
}

// HandlerFor returns an uninstrumented http.Handler for the provided
// Gatherer. The behavior of the Handler is defined by the provided
// HandlerOpts. Thus, HandlerFor is useful to create http.Handlers for custom
// Gatherers, with non-default HandlerOpts, and/or with custom (or no)
// instrumentation. Use the InstrumentMetricHandler function to apply the same
// kind of instrumentation as it is used by the Handler function.
// HandlerFor 函数为提供的 Gatherer 返回一个未进行仪器化的 http.Handler。
// Handler 的行为由提供的 HandlerOpts 定义。因此，HandlerFor 函数对于创建自定义 Gatherer、使用非默认 HandlerOpts
// 以及/或者使用自定义（或不使用）仪器化的 http.Handler 非常有用。
// 可以使用 InstrumentMetricHandler 函数来应用与 Handler 函数相同类型的仪器化。
func HandlerFor(reg prometheus.Gatherer, opts HandlerOpts) http.Handler {
    // 将传入的 Gatherer 转换为 TransactionalGatherer 并调用 HandlerForTransactional 函数
    return HandlerForTransactional(prometheus.ToTransactionalGatherer(reg), opts)
}

2.3.4、Handler 内部的详细处理流程

我们继续深入 promhttp.Handler() 内部，详细讲解当一个 /metrics 请求到达时，指标是如何被一步步收集、处理并最终暴露出去的。

整个核心逻辑在 HandlerForTransactional 函数返回的 http.HandlerFunc 中。下面我们分解这个过程：

https://github.com/prometheus/client_golang/blob/v1.22.0/prometheus/promhttp/http.go#L156

h := http.HandlerFunc(func(rsp http.ResponseWriter, req *http.Request) {
    // ...
})

2.3.5、并发请求限制 (MaxRequestsInFlight)

// 检查 inFlightSem 是否不为 nil。如果 inFlightSem 不为 nil，说明设置了并发请求的最大数量限制
if inFlightSem != nil {
    // 使用 select 语句来处理并发控制。select 语句类似于 switch 语句，但用于处理通道操作
    select {
    // 尝试向 inFlightSem 通道发送一个空结构体 struct{}{}。如果通道未满，说明还有并发请求的名额
    case inFlightSem <- struct{}{}: 
        // 如果成功发送，使用 defer 关键字确保在函数返回时，从 inFlightSem 通道接收一个元素。
        // 这表示当前请求处理完成，释放一个并发请求的名额
        defer func() { <-inFlightSem }()
    // 如果 inFlightSem 通道已满，select 语句会执行 default 分支
    default:
        // 使用 http.Error 函数向客户端返回一个错误响应
        http.Error(rsp, fmt.Sprintf(
            // 构造错误信息，告知客户端并发请求的最大数量已达到，并提示稍后再试
            "Limit of concurrent requests reached (%d), try again later.", opts.MaxRequestsInFlight,
        ), 
        // 设置 HTTP 状态码为 503 Service Unavailable，表示服务器当前无法处理请求
        http.StatusServiceUnavailable)
        // 直接返回函数，不再继续处理该请求
        return
    }
}

这段代码的主要功能是实现对并发请求数量的限制。如果当前并发请求数量达到了预设的最大值，会拒绝新的请求并返回 503 状态码。

作用：如果用户在 HandlerOpts 中设置了 MaxRequestsInFlight，这里会使用一个 channel (inFlightSem) 作为信号量，来限制同时处理的 scrape 请求数量。
流程：

请求进入时，尝试向 channel 发送一个值。
如果 channel 已满（达到并发上限），select 的 default 分支会立即执行，直接返回 503 Service Unavailable 错误，防止系统过载。
如果成功，则继续执行，并通过 defer 确保请求结束时释放信号量。

2.3.6、收集所有指标 (Gather)

https://github.com/prometheus/client_golang/blob/v1.22.0/prometheus/promhttp/http.go#L171

mfs, done, err := reg.Gather()
defer done()

- - mfs, done, err := reg.Gather()
    - reg：reg 是一个实现了 prometheus.TransactionalGatherer 接口的对象，它负责收集注册在其中的所有指标。TransactionalGatherer 接口扩展了 Gatherer 接口，支持事务性的指标收集操作。
    - Gather() 方法：调用 reg 的 Gather() 方法会触发指标收集过程。该方法会遍历所有注册的收集器（Collector），调用它们的 Collect 方法来收集指标，并将这些指标组织成 dto.MetricFamily 类型的切片返回。
    - mfs：mfs 是一个 []*dto.MetricFamily 类型的变量，用于存储收集到的指标族。每个 dto.MetricFamily 代表一个具有相同名称和类型的指标集合。
    - done：done 是一个函数，用于标记指标收集事务的结束。在使用 TransactionalGatherer 时，需要调用这个函数来完成事务，确保资源的正确释放和状态的更新。
    - err：err 是一个 error 类型的变量，用于存储指标收集过程中可能出现的错误。即使出现错误，Gather() 方法也会尝试收集尽可能多的指标。
  - defer done()
    - defer 关键字：defer 关键字用于延迟执行一个函数，直到包含它的函数返回。在这个例子中，defer done() 表示无论当前函数是正常返回还是因为异常而返回，done() 函数都会在函数返回之前被调用。
    - done() 函数：调用 done() 函数会完成指标收集事务，释放相关的资源，并更新状态。这是使用 TransactionalGatherer 时的必要步骤，确保事务的完整性。

作用：这是暴露指标最核心的一步。调用 Gatherer (这里是 TransactionalGatherer) 的 Gather() 方法。

流程：

- Gather() 方法会遍历注册到 Gatherer 上的所有 Collector。
- 对每个 Collector 调用其 Collect() 方法。
- Collect() 方法将指标数据填充到一个 channel 中。
- Gather() 将从 channel 中收集到的所有指标数据，聚合成一个 []*dto.MetricFamily 切片 (mfs) 返回。dto.MetricFamily 是指标在内存中的标准数据结构。
- done() 用于在事务性收集中进行清理，确保数据一致性。

posted @ 2025-07-16 17:08 左扬阅读(107) 评论(0) 收藏举报

刷新页面返回顶部

左扬(你们的胃叫胃，孤的叫胃PLUS)

读书不觉春已深，一寸光阴一寸金。

Prometheus 源码专题【左扬精讲】—— Prometheus Exporter 定制化开发：采集篇 —— 深入学习采集器工作机制

Prometheus 源码专题【左扬精讲】—— Prometheus Exporter 定制化开发：采集篇 —— 深入学习采集器工作机制

一、Prometheus Exporter 采集器概述

1.1、什么是采集器

1.2、解决的问题

1.3、本质和核心功能

二、Prometheus Exporter 采集器工作机制详解

2.1、注册指标

2.2、采集指标

2.2.1、为什么要重写 Collect 方法

2.2.1.1、实现自定义指标采集逻辑

2.2.1.2、实现自定义指标采集逻辑

2.2.1.3、重写 Collect 方法注意事项（指标一致性（Desc 匹配））

2.2.1.4、重写 Collect 方法注意事项（错误处理）

2.2.1.5、重写 Collect 方法注意事项（性能优化）

2.2.1.6、重写 Collect 方法注意事项（指标类型一致性）

2.2.1.7、重写 Collect 方法注意事项（内存管理）

2.2.1.8、重写 Collect 方法注意事项（监控自身健康状态）

2.3、暴露指标

2.3.1、关键代码入口

2.3.2、promhttp.Handler() 源码解读

2.3.3、HandlerFor 的实现

2.3.4、Handler 内部的详细处理流程

2.3.5、并发请求限制 (MaxRequestsInFlight)

2.3.6、收集所有指标 (Gather)

公告