Prometheus 源码专题【左扬精讲】—— 监控系统 Prometheus 3.4.0 源码解析：Web API 与联邦架构下，远程读（Remote Read）请求全流程梳理（含 ProtoBuf/Snappy）与 Go 客户端 Demo 开发实战

监控系统 Prometheus 3.4.0 源码解析：Web API 与联邦架构下，远程读（Remote Read）请求全流程梳理（含 ProtoBuf/Snappy）与 Go 客户端 Demo 开发实战

在 Prometheus 监控体系中，Remote Read（远程读取）是实现监控数据跨实例共享、联邦部署及长期存储集成的核心能力之一。与 Remote Write（远程写入）主动推送数据不同，Remote Read 允许 Prometheus 实例从远程端点（如其他 Prometheus 实例、长期存储系统 Thanos/Cortex 等）按需拉取时序数据，以支撑跨集群监控、历史数据查询等场景。本文将基于 Prometheus 3.4.0 源码，从核心概念、代码结构、请求流程到关键逻辑，全面解析 Remote Read 的实现机制。

一、Remote Read 核心概念与设计目标

在深入源码前，需先明确 Remote Read 的核心定位 —— 它并非简单的 “数据拉取”，而是围绕 “高效、兼容、可扩展” 设计的标准化数据交互方案，其核心目标包括：

- 跨实例数据融合：支持从其他 Prometheus 实例拉取数据，解决单实例监控范围有限的问题（如联邦部署中的 “子节点向父节点提供数据” 场景）；
- 长期存储集成：对接 Thanos、Cortex 等分布式存储系统，拉取 Prometheus 本地 TSDB 中已过期的历史数据，补全查询链路；
- 标准化数据格式：通过 Protobuf 定义统一的请求 / 响应格式，确保不同实现（如 Prometheus 原生、第三方存储）间的兼容性；
- 按需查询优化：支持按时间范围、标签筛选条件拉取数据，避免全量数据传输，降低网络与计算开销。

从源码角度看，Remote Read 的实现依赖两大核心组件：Remote Read API 接口（负责接收 / 发起 HTTP 请求）和 Remote Read 客户端（负责与远程端点交互、数据编解码），二者通过 Protobuf 协议定义的结构化数据串联。

二、Remote Read 相关源码结构

核心模块	功能描述	源码路径（相对 prometheus 根目录）
Remote API Protobuf 定义	定义 Remote Read 的请求/响应 Protobuf 结构	prompb/remote.proto、prompb/remote.pb.go
Remote Read HTTP 接口	定义 HTTP API 路由和处理函数	web/api/v1/api.go（第410行注册 /read 路由）
Remote Read 服务器端处理	实现接收 Remote Read 请求的逻辑（解码请求、查询数据、编码响应）	storage/remote/read_handler.go
Remote Read 客户端	实现向远程端点发起 Read 请求的逻辑（HTTP 调用、数据编解码、重试）	storage/remote/client.go、storage/remote/read.go
数据编解码	Protobuf 与内部数据结构的转换	storage/remote/codec.go
TSDB 数据桥接	对接本地 TSDB，通过 Fanout Storage 合并本地和远程数据	storage/fanout.go、storage/remote/storage.go
查询引擎	执行 PromQL 查询，通过 Storage 接口访问数据	promql/engine.go
配置解析	解析 prometheus.yml 中 remote_read 配置	config/config.go（RemoteReadConfig 结构体在 1428 行）

三、Protobuf 协议定义解析

Remote Read 的通信协议基于 Protobuf 3，核心定义在 `prompb/remote.proto` 文件中。下面我们详细解析关键的数据结构。

3.1、ReadRequest：远程读请求结构

message ReadRequest {
  repeated Query queries = 1;
  
  enum ResponseType {
    // 服务器返回单个 ReadResponse 消息，包含匹配的序列和原始样本列表
    SAMPLES = 0;
    // 服务器流式传输 ChunkedReadResponse 消息，包含 XOR 或 HISTOGRAM 编码的块
    STREAMED_XOR_CHUNKS = 1;
  }
  
  // 允许协商响应的内容类型，按 FIFO 顺序选择
  repeated ResponseType accepted_response_types = 2;
}

关键代码说明：

- - 多查询支持：queries 字段允许一次请求包含多个查询，减少网络往返次数；
  - 响应类型协商：通过 accepted_response_types 支持客户端与服务器端协商最优的响应格式（原始样本 vs 压缩块）；
  - 向后兼容：如果 accepted_response_types 为空，默认使用 SAMPLES 类型。

3.2、Query：查询条件定义

message Query {
  int64 start_timestamp_ms = 1;    // 查询起始时间（毫秒）
  int64 end_timestamp_ms = 2;       // 查询结束时间（毫秒）
  repeated prometheus.LabelMatcher matchers = 3;  // 标签匹配器（如 job="prometheus"）
  prometheus.ReadHints hints = 4;   // 查询提示（可选，用于优化查询性能）
}

关键代码说明：

- - matchers：使用标签匹配器（=, !=, =~, !~）筛选时间序列，如 {job="prometheus", instance="localhost:9090"}；
  - hints：可选的查询提示，包含聚合函数（Func）、分组（Grouping）、步长（StepMs）等信息，帮助服务器端优化查询。

3.3、ReadResponse：非流式响应结构

message ReadResponse {
  repeated QueryResult results = 1;  // 与请求中的 queries 顺序对应
}

message QueryResult {
  repeated prometheus.TimeSeries timeseries = 1;  // 按时间排序的时间序列列表
}

关键代码说明：

响应格式：ReadResponse.results 与 ReadRequest.queries 一一对应，每个 QueryResult 包含匹配的时间序列及其样本数据。

3.4、ChunkedReadResponse：流式响应结构

message ChunkedReadResponse {
  repeated prometheus.ChunkedSeries chunked_series = 1;  // 压缩后的序列块
  int64 query_index = 2;  // 对应的查询索引
}

关键代码说明：

- - 内存友好：流式传输避免一次性加载全部数据，降低服务器端内存压力；
  - 压缩高效：使用 XOR 或 HISTOGRAM 编码压缩时序数据，显著减少网络传输量；
  - 分片传输：单个序列可能被拆分成多个 ChunkedReadResponse 消息，支持大时间范围的查询。

四、请求/响应编解码：Snappy 压缩与 Protobuf 序列化

Remote Read 的数据传输采用 Protobuf 序列化 + Snappy 压缩的组合方案，在保证数据完整性的同时，显著减少网络传输开销。

4.1、服务器端：请求解码流程

服务器端在 https://github.com/prometheus/prometheus/blob/v3.4.0/storage/remote/codec.go 的 DecodeReadRequest 函数中实现请求解码：

// DecodeReadRequest 从 http.Request 中读取并解析出 prompb.ReadRequest
// 作用：将HTTP请求体中经过snappy压缩的Protocol Buffer数据，解码为Prometheus的ReadRequest结构体
func DecodeReadRequest(r *http.Request) (*prompb.ReadRequest, error) {
	// 1. 读取HTTP请求体数据，同时限制最大读取长度为decodeReadLimit（防止超大请求体攻击）
	// io.LimitReader 限制读取上限，避免内存溢出；io.ReadAll 读取所有限制内的字节数据
	compressed, err := io.ReadAll(io.LimitReader(r.Body, decodeReadLimit))
	if err != nil {
		// 2. 若读取请求体失败（如网络中断、数据超长等），返回错误
		return nil, err
	}

	// 3. 使用snappy解压缩读取到的字节数据
	// snappy是高效的压缩算法，Prometheus默认使用其压缩PB数据；第一个参数传nil表示让snappy自动分配输出缓冲区
	reqBuf, err := snappy.Decode(nil, compressed)
	if err != nil {
		// 4. 若解压缩失败（如数据损坏、不是snappy格式等），返回错误
		return nil, err
	}

	// 5. 初始化一个prompb.ReadRequest结构体，用于存储解码后的数据
	// prompb是Prometheus定义的Protocol Buffer生成的Go包，ReadRequest是读请求的标准结构
	var req prompb.ReadRequest
	// 6. 将解压缩后的原始PB字节数据，反序列化为ReadRequest结构体
	if err := proto.Unmarshal(reqBuf, &req); err != nil {
		// 7. 若PB反序列化失败（如数据格式不匹配、字段缺失等），返回错误
		return nil, err
	}

	// 8. 所有解码步骤成功，返回解析后的ReadRequest指针
	return &req, nil
}

关键代码说明：

- - - 读取请求体：使用 io.LimitReader 限制请求体最大为 32MB，防止恶意请求导致内存溢出；
    - Snappy 解压缩：将压缩的请求体解压为原始字节流；
    - Protobuf 反序列化：将字节流解析为 prompb.ReadRequest 结构体。

4.2、服务器端：响应编码流程

https://github.com/prometheus/prometheus/blob/v3.4.0/storage/remote/codec.go#L84

服务器端在 EncodeReadResponse 函数中实现响应编码（非流式响应）：

// EncodeReadResponse 将 prompb.ReadResponse 编码并写入 http.ResponseWriter
// 作用：把Prometheus的ReadResponse结构体序列化为Protocol Buffer数据，经snappy压缩后返回给HTTP客户端
func EncodeReadResponse(resp *prompb.ReadResponse, w http.ResponseWriter) error {
	// 1. 将 prompb.ReadResponse 结构体序列化为原始Protocol Buffer字节数据
	// proto.Marshal 是Protocol Buffer的标准序列化方法，将Go结构体转换为二进制PB格式
	data, err := proto.Marshal(resp)
	if err != nil {
		// 2. 若PB序列化失败（如结构体字段不符合PB定义、存在不支持的类型等），返回错误
		return err
	}

	// 3. 使用snappy对PB原始字节数据进行压缩
	// 遵循Prometheus的通信规范，压缩后的数据体积更小，传输效率更高
	// 第一个参数传nil表示让snappy自动分配输出缓冲区，接收压缩后的字节切片
	compressed := snappy.Encode(nil, data)

	// 4. 将压缩后的字节数据写入HTTP响应体，返回给客户端
	// w.Write 会将数据写入响应流，同时自动处理HTTP响应头的Content-Length等基础字段
	_, err = w.Write(compressed)
	// 5. 返回写入过程中的错误（如网络中断、响应流关闭等），无错误则返回nil
	return err
}

关键代码说明：

- - - Protobuf 序列化：将 prompb.ReadResponse 结构体序列化为字节流；
    - Snappy 压缩：压缩序列化后的数据，通常可减少 60-80% 的传输量；
    - 设置响应头：在 read_handler.go 中设置 Content-Type: application/x-protobuf 和 Content-Encoding: snappy。

4.3、客户端：请求编码流程

https://github.com/prometheus/prometheus/blob/v3.4.0/storage/remote/client.go

客户端在 https://github.com/prometheus/prometheus/blob/v3.4.0/storage/remote/client.go#L342 的 Read 方法中实现请求编码：

// Read 从远程端点读取数据
// 参数说明：
//   - ctx: 上下文，用于传递超时、取消信号等控制信息
//   - query: 读取查询条件，包含时间范围、匹配规则等查询参数
//   - sortSeries: 是否对返回的序列进行排序（仅对"采样响应"生效；"分块响应"由服务端返回时已排序）
// 返回值：
//   - storage.SeriesSet: 读取到的时间序列数据集
//   - error: 执行过程中出现的错误（如网络异常、序列化失败等）
func (c *Client) Read(ctx context.Context, query *prompb.Query, sortSeries bool) (storage.SeriesSet, error) {
	// 1. 递增"读取查询计数"指标（用于监控统计：当前正在进行的读取请求数）
	c.readQueries.Inc()
	// 2. 延迟执行：函数退出时递减该计数，确保无论正常返回还是错误退出都能正确统计
	defer c.readQueries.Dec()

	// 3. 构造Prometheus远程读取请求结构体
	req := &prompb.ReadRequest{
		// TODO: 待优化：支持将多个查询批量打包到一个读取请求中
		// 目前仅支持单个查询（Protocol Buffer协议本身支持多查询批量传输）
		Queries:               []*prompb.Query{query}, // 待执行的查询列表（当前仅1个）
		AcceptedResponseTypes: AcceptedResponseTypes,  // 客户端支持的响应类型（预定义常量）
	}

	// 4. 将ReadRequest结构体序列化为原始Protocol Buffer字节数据
	data, err := proto.Marshal(req)
	if err != nil {
		// 5. 序列化失败时，返回带上下文的错误（使用%w包装原始错误，保留错误链）
		return nil, fmt.Errorf("unable to marshal read request: %w", err)
	}

	// 6. 使用snappy算法压缩PB原始数据（遵循Prometheus通信规范，减小传输体积提升效率）
	compressed := snappy.Encode(nil, data)

	// 7. 构造HTTP POST请求：目标URL为客户端配置的远程端点，请求体为压缩后的数据
	httpReq, err := http.NewRequest(http.MethodPost, c.urlString, bytes.NewReader(compressed))
	if err != nil {
		// 8. 构造HTTP请求失败（如URL格式错误），返回带上下文的错误
		return nil, fmt.Errorf("unable to create request: %w", err)
	}

	// 9. 设置HTTP请求头：声明请求体编码格式为snappy
	httpReq.Header.Add("Content-Encoding", "snappy")
	// 10. 设置HTTP请求头：告知服务端客户端支持snappy编码的响应
	httpReq.Header.Add("Accept-Encoding", "snappy")
	// 11. 设置HTTP请求头：声明请求体格式为Protocol Buffer
	httpReq.Header.Set("Content-Type", "application/x-protobuf")
	// 12. 设置HTTP请求头：客户端标识（用户代理）
	httpReq.Header.Set("User-Agent", UserAgent)
	// 13. 设置HTTP请求头：指定Prometheus远程读取协议版本
	httpReq.Header.Set("X-Prometheus-Remote-Read-Version", "0.1.0")

	// 14. 定义超时错误信息：包含超时原因和超时时间（增强错误可读性）
	errTimeout := fmt.Errorf("%w: request timed out after %s", context.DeadlineExceeded, c.timeout)
	// 15. 基于传入的上下文，创建带超时的子上下文（超时时间为客户端配置的timeout）
	// 同时绑定超时原因（errTimeout），便于后续错误判断
	ctx, cancel := context.WithTimeoutCause(ctx, c.timeout, errTimeout)

	// 16. 初始化OpenTelemetry追踪 span（分布式追踪：标记"远程读取"操作，类型为客户端）
	ctx, span := otel.Tracer("").Start(ctx, "Remote Read", trace.WithSpanKind(trace.SpanKindClient))
	// 17. 延迟执行：函数退出时结束span，确保追踪数据完整上报
	defer span.End()

	// 18. 记录请求开始时间（用于统计请求耗时）
	start := time.Now()
	// 19. 发送HTTP请求：将带超时的上下文绑定到请求，执行请求并获取响应
	httpResp, err := c.Client.Do(httpReq.WithContext(ctx))
	if err != nil {
		// 20. 发送请求失败（如网络中断、超时），取消上下文并返回错误
		cancel()
		return nil, fmt.Errorf("error sending request: %w", err)
	}

	// 21. 检查HTTP响应状态码：非2xx（1xx/3xx/4xx/5xx）视为请求失败
	if httpResp.StatusCode/100 != 2 {
		// 22. 尝试读取响应体中的错误信息（即使状态码错误，也尽可能获取服务端反馈）
		body, _ := io.ReadAll(httpResp.Body)
		// 23. 关闭响应体（必须关闭，避免资源泄漏）
		_ = httpResp.Body.Close()

		// 24. 取消上下文（请求失败，释放超时控制资源）
		cancel()
		// 25. 去除错误信息中的首尾换行符，格式化错误字符串
		errStr := strings.Trim(string(body), "\n")
		err := errors.New(errStr)
		// 26. 返回带上下文的错误：包含远程服务地址、HTTP状态码和服务端错误信息
		return nil, fmt.Errorf("remote server %s returned http status %s: %w", c.urlString, httpResp.Status, err)
	}

	// 27. 获取响应头中的Content-Type（判断服务端返回的响应格式）
	contentType := httpResp.Header.Get("Content-Type")

	// 28. 根据响应类型分支处理
	switch {
	// 分支1：响应类型为"普通PB格式"（采样响应）
	case strings.HasPrefix(contentType, "application/x-protobuf"):
		// 29. 记录"采样响应"的请求耗时指标（监控统计：按响应类型分类）
		c.readQueriesDuration.WithLabelValues("sampled").Observe(time.Since(start).Seconds())
		// 30. 递增"采样响应"的请求总数指标（标签包含响应状态码）
		c.readQueriesTotal.WithLabelValues("sampled", strconv.Itoa(httpResp.StatusCode)).Inc()
		// 31. 处理采样响应：解析PB数据并根据sortSeries参数决定是否排序
		ss, err := c.handleSampledResponse(req, httpResp, sortSeries)
		// 32. 取消上下文（采样响应处理完成，释放资源）
		cancel()
		// 33. 返回解析后的时间序列集和可能的错误
		return ss, err

	// 分支2：响应类型为"流式PB格式"（分块响应，协议指定proto为prometheus.ChunkedReadResponse）
	case strings.HasPrefix(contentType, "application/x-streamed-protobuf; proto=prometheus.ChunkedReadResponse"):
		// 34. 记录"分块响应"的请求耗时指标
		c.readQueriesDuration.WithLabelValues("chunked").Observe(time.Since(start).Seconds())

		// 35. 创建分块响应读取器：读取响应体，限制单块最大大小为chunkedReadLimit
		s := NewChunkedReader(httpResp.Body, c.chunkedReadLimit, nil)
		// 36. 创建分块序列集：封装读取器，处理分块数据解析、时间范围过滤
		// 传入回调函数：当序列集关闭或出错时，更新统计指标并取消上下文
		return NewChunkedSeriesSet(s, httpResp.Body, query.StartTimestampMs, query.EndTimestampMs, func(err error) {
			// 37. 默认为HTTP响应状态码作为指标标签
			code := strconv.Itoa(httpResp.StatusCode)
			// 38. 若错误不是EOF（正常结束），则标记为"中断的流"
			if !errors.Is(err, io.EOF) {
				code = "aborted_stream"
			}
			// 39. 递增"分块响应"的请求总数指标（标签包含状态码/中断标记）
			c.readQueriesTotal.WithLabelValues("chunked", code).Inc()
			// 40. 取消上下文（分块处理完成/中断，释放资源）
			cancel()
		}), nil

	// 分支3：不支持的响应类型
	default:
		// 41. 记录"不支持响应类型"的请求耗时指标
		c.readQueriesDuration.WithLabelValues("unsupported").Observe(time.Since(start).Seconds())
		// 42. 递增"不支持响应类型"的请求总数指标
		c.readQueriesTotal.WithLabelValues("unsupported", strconv.Itoa(httpResp.StatusCode)).Inc()
		// 43. 取消上下文
		cancel()
		// 44. 返回不支持响应类型的错误
		return nil, fmt.Errorf("unsupported content type: %s", contentType)
	}
}

4.4 客户端：响应解码流程

https://github.com/prometheus/prometheus/blob/v3.4.0/storage/remote/client.go#L392

客户端根据响应头的 Content-Type 选择不同的解码策略：

// 27. 获取响应头中的Content-Type（判断服务端返回的响应格式）
	contentType := httpResp.Header.Get("Content-Type")

	// 28. 根据响应类型分支处理
	switch {
	// 分支1：响应类型为"普通PB格式"（采样响应）
	case strings.HasPrefix(contentType, "application/x-protobuf"):
		// 29. 记录"采样响应"的请求耗时指标（监控统计：按响应类型分类）
		c.readQueriesDuration.WithLabelValues("sampled").Observe(time.Since(start).Seconds())
		// 30. 递增"采样响应"的请求总数指标（标签包含响应状态码）
		c.readQueriesTotal.WithLabelValues("sampled", strconv.Itoa(httpResp.StatusCode)).Inc()
		// 31. 处理采样响应：解析PB数据并根据sortSeries参数决定是否排序
		ss, err := c.handleSampledResponse(req, httpResp, sortSeries)
		// 32. 取消上下文（采样响应处理完成，释放资源）
		cancel()
		// 33. 返回解析后的时间序列集和可能的错误
		return ss, err

	// 分支2：响应类型为"流式PB格式"（分块响应，协议指定proto为prometheus.ChunkedReadResponse）
	case strings.HasPrefix(contentType, "application/x-streamed-protobuf; proto=prometheus.ChunkedReadResponse"):
		// 34. 记录"分块响应"的请求耗时指标
		c.readQueriesDuration.WithLabelValues("chunked").Observe(time.Since(start).Seconds())

		// 35. 创建分块响应读取器：读取响应体，限制单块最大大小为chunkedReadLimit
		s := NewChunkedReader(httpResp.Body, c.chunkedReadLimit, nil)
		// 36. 创建分块序列集：封装读取器，处理分块数据解析、时间范围过滤
		// 传入回调函数：当序列集关闭或出错时，更新统计指标并取消上下文
		return NewChunkedSeriesSet(s, httpResp.Body, query.StartTimestampMs, query.EndTimestampMs, func(err error) {
			// 37. 默认为HTTP响应状态码作为指标标签
			code := strconv.Itoa(httpResp.StatusCode)
			// 38. 若错误不是EOF（正常结束），则标记为"中断的流"
			if !errors.Is(err, io.EOF) {
				code = "aborted_stream"
			}
			// 39. 递增"分块响应"的请求总数指标（标签包含状态码/中断标记）
			c.readQueriesTotal.WithLabelValues("chunked", code).Inc()
			// 40. 取消上下文（分块处理完成/中断，释放资源）
			cancel()
		}), nil

	// 分支3：不支持的响应类型
	default:
		// 41. 记录"不支持响应类型"的请求耗时指标
		c.readQueriesDuration.WithLabelValues("unsupported").Observe(time.Since(start).Seconds())
		// 42. 递增"不支持响应类型"的请求总数指标
		c.readQueriesTotal.WithLabelValues("unsupported", strconv.Itoa(httpResp.StatusCode)).Inc()
		// 43. 取消上下文
		cancel()
		// 44. 返回不支持响应类型的错误
		return nil, fmt.Errorf("unsupported content type: %s", contentType)
	}

非流式响应解码（handleSampledResponse）https://github.com/prometheus/prometheus/blob/v3.4.0/storage/remote/client.go#L421

// handleSampledResponse 处理远程读取的"采样响应"（application/x-protobuf格式）
// 作用：读取HTTP响应体、解压缩、反序列化为PB结构，校验响应有效性后转换为存储层的SeriesSet
// 参数说明：
//   - req: 原始的读取请求（用于校验响应与请求的查询数量匹配）
//   - httpResp: 远程服务返回的HTTP响应
//   - sortSeries: 是否对响应中的序列进行排序（遵循上层Read方法的配置）
// 返回值：
//   - storage.SeriesSet: 转换后的时间序列数据集
//   - error: 处理过程中的错误（如读取失败、解压缩失败、响应不匹配等）
func (c *Client) handleSampledResponse(req *prompb.ReadRequest, httpResp *http.Response, sortSeries bool) (storage.SeriesSet, error) {
	// 1. 读取HTTP响应体中的所有数据（响应体为snappy压缩后的PB数据）
	compressed, err := io.ReadAll(httpResp.Body)
	if err != nil {
		// 2. 读取响应体失败时，返回带HTTP状态码的上下文错误
		return nil, fmt.Errorf("error reading response. HTTP status code: %s: %w", httpResp.Status, err)
	}

	// 3. 延迟清理函数：确保响应体资源完全释放
	defer func() {
		// 3.1 读取并丢弃响应体中剩余的所有数据（防止连接复用出现问题）
		_, _ = io.Copy(io.Discard, httpResp.Body)
		// 3.2 关闭响应体（必须执行，避免TCP连接泄漏）
		_ = httpResp.Body.Close()
	}()

	// 4. 使用snappy解压缩响应体数据（还原为原始PB字节数据）
	uncompressed, err := snappy.Decode(nil, compressed)
	if err != nil {
		// 5. 解压缩失败（如数据损坏、非snappy格式），返回错误
		return nil, fmt.Errorf("error reading response: %w", err)
	}

	// 6. 初始化PB ReadResponse结构体，用于存储反序列化后的数据
	var resp prompb.ReadResponse
	// 7. 将解压缩后的原始PB数据反序列化为ReadResponse结构体
	err = proto.Unmarshal(uncompressed, &resp)
	if err != nil {
		// 8. PB反序列化失败（如数据格式不匹配、字段缺失），返回错误
		return nil, fmt.Errorf("unable to unmarshal response body: %w", err)
	}

	// 9. 校验响应结果数量与请求查询数量是否一致（确保一一对应）
	if len(resp.Results) != len(req.Queries) {
		// 10. 数量不匹配时返回错误：明确期望数量和实际返回数量
		return nil, fmt.Errorf("responses: want %d, got %d", len(req.Queries), len(resp.Results))
	}

	// 11. 注释说明：当前客户端不支持批量查询（上层构造请求时仅传入1个查询）
	// 因此响应结果集中必然只有1个结果，直接取第一个元素
	res := resp.Results[0]

	// 12. 将PB格式的查询结果转换为存储层的SeriesSet
	// 并根据sortSeries参数决定是否对序列排序，返回最终结果
	return FromQueryResult(sortSeries, res), nil
}

4.5、数据转换：内部结构与 Protobuf 互转

Prometheus 内部使用 labels.Matcher 和 storage.SeriesSet 等结构，需要与 Protobuf 的 prompb.LabelMatcher 和 prompb.TimeSeries 进行转换。

标签匹配器转换（ToQuery）https://github.com/prometheus/prometheus/blob/v3.4.0/storage/remote/codec.go#L96：

// ToQuery 构建 Prometheus 远程读取所需的 prompb.Query 协议缓冲区（Protocol Buffer）结构体
// 作用：将存储层的时间范围、标签匹配器、查询提示等参数，转换为远程读取协议定义的标准 Query 格式
// 参数说明：
//   - from: 查询的起始时间戳（毫秒级 Unix 时间）
//   - to: 查询的结束时间戳（毫秒级 Unix 时间）
//   - matchers: 标签匹配器列表（存储层定义的 labels.Matcher 类型，用于筛选目标时间序列）
//   - hints: 查询优化提示（存储层的 SelectHints，包含步长、聚合函数、分组规则等优化信息，可为 nil）
// 返回值：
//   - *prompb.Query: 转换后的标准 PB 格式查询结构体
//   - error: 转换过程中出现的错误（如标签匹配器转换失败）
func ToQuery(from, to int64, matchers []*labels.Matcher, hints *storage.SelectHints) (*prompb.Query, error) {
	// 1. 将存储层的 labels.Matcher 列表转换为 PB 协议定义的 prompb.LabelMatcher 列表
	// （因为远程通信需遵循 PB 协议格式，需统一数据结构）
	ms, err := ToLabelMatchers(matchers)
	if err != nil {
		// 2. 标签匹配器转换失败时，直接返回错误
		return nil, err
	}

	// 3. 声明 PB 格式的查询提示结构体变量（初始为 nil）
	var rp *prompb.ReadHints
	// 4. 若传入了查询优化提示（hints 非 nil），则将其转换为 PB 格式的 ReadHints
	if hints != nil {
		rp = &prompb.ReadHints{
			StartMs:  hints.Start,  // 提示的查询起始时间（毫秒）
			EndMs:    hints.End,    // 提示的查询结束时间（毫秒）
			StepMs:   hints.Step,   // 提示的采样步长（毫秒，用于聚合计算）
			Func:     hints.Func,   // 提示的聚合函数（如 sum、avg 等）
			Grouping: hints.Grouping, // 提示的分组标签列表
			By:       hints.By,     // 是否按分组标签聚合（true 表示按 Grouping 分组，false 表示排除 Grouping 标签）
			RangeMs:  hints.Range,  // 提示的时间范围（毫秒，用于范围查询如 rate()）
		}
	}

	// 5. 构造并返回 PB 格式的 Query 结构体
	// 整合时间范围、转换后的标签匹配器、查询提示（非 nil 则传入）
	return &prompb.Query{
		StartTimestampMs: from,    // 查询的正式起始时间戳（毫秒）
		EndTimestampMs:   to,      // 查询的正式结束时间戳（毫秒）
		Matchers:         ms,      // 转换后的 PB 格式标签匹配器列表
		Hints:            rp,      // 转换后的 PB 格式查询提示（可为 nil）
	}, nil
}

posted @ 2025-11-03 14:55 左扬阅读(8) 评论(0) 收藏举报

刷新页面返回顶部

左扬(你们的胃叫胃，孤的叫胃PLUS)

读书不觉春已深，一寸光阴一寸金。

Prometheus 源码专题【左扬精讲】—— 监控系统 Prometheus 3.4.0 源码解析：Web API 与联邦架构下，远程读（Remote Read）请求全流程梳理（含 ProtoBuf/Snappy）与 Go 客户端 Demo 开发实战

监控系统 Prometheus 3.4.0 源码解析：Web API 与联邦架构下，远程读（Remote Read）请求全流程梳理（含 ProtoBuf/Snappy）与 Go 客户端 Demo 开发实战

一、Remote Read 核心概念与设计目标

二、Remote Read 相关源码结构

三、Protobuf 协议定义解析

3.1、ReadRequest：远程读请求结构

3.2、Query：查询条件定义

3.3、ReadResponse：非流式响应结构

3.4、ChunkedReadResponse：流式响应结构

四、请求/响应编解码：Snappy 压缩与 Protobuf 序列化

4.1、服务器端：请求解码流程

4.2、服务器端：响应编码流程

4.3、客户端：请求编码流程

4.4 客户端：响应解码流程

4.5、数据转换：内部结构与 Protobuf 互转

公告