Prometheus存储

本地存储

Prometheus 的本地时间序列数据库以自定义、高效的格式将数据存储在本地存储中。

默认情况下,Prometheus将采集的数据存储在本地的TSDB数据库中,路径默认为Prometheus安装目录的data目录。

磁盘布局

采集的样本被分组为两个小时的block。每个两小时的block由一个目录组成,该目录包含一个包含该时间窗口的所有时间序列样本的block子目录、一个元数据文件和一个索引文件(将度量名称和标签索引到块目录中的时间序列)。chunks 目录中的样本默认组合成一个或多个段文件,每个段文件最大为 512MB。当通过 API 删除系列时,删除记录存储在单独的 tombstone 文件中(而不是立即从块段中删除数据)。

传入样本的当前块保存在内存中,并且没有完全持久化。它通过预写日志 (WAL) 防止崩溃,当 Prometheus 服务器重新启动时可以重放该日志。预写日志文件wal以 128MB 段的形式存储在目录中。这些文件包含尚未压缩的原始数据;因此它们比常规块文件大得多。Prometheus 将至少保留三个预写日志文件。高流量服务器可能会保留三个以上的 WAL 文件,以便保留至少两个小时的原始数据。

Prometheus 服务器的数据目录如下所示:

./data
├── 01BKGV7JBM69T2G1BGBGM6KB12     # 块
│   └── meta.json               # 元数据
├── 01BKGTZQ1SYQJTR4PB43C8PD98     # 块
│   ├── chunks                  # 样本数据
│   │   └── 000001             # 数据目录,每个大小为512M超过会被切分为多个
│   ├── tombstones              # 逻辑数据
│   ├── index                   # 索引文件
│   └── meta.json               # 元数据
├── 01BKGTZQ1HHWHV8FBJXW1Y3W0K     # 块
│   └── meta.json               # 元数据
├── 01BKGV7JC0RY8A6MACW02A2PJD     # 块
│   ├── chunks                   # 样本数据
│   │   └── 000001
│   ├── tombstones               # 逻辑数据
│   ├── index                    # 索引文件
│   └── meta.json                # 元数据
├── chunks_head
│   └── 000001
└── wal
    ├── 000000002
    └── checkpoint.00000001
        └── 00000000

block介绍

每个block为一个data目录中以01开头的存储目录。

block特性

block会压缩、合并历史数据库,以及删除过期的快,随着压缩、合并,block的数量会减少,在压缩过程中会发生三件事:定期执行压缩、合并小的block到大的block、清理过期的块。

本地存储配置参数

--storage.tsdb.path: Prometheus 写入数据库的地方。默认为data/.
--storage.tsdb.retention.time: 何时删除旧数据。默认为15d. storage.tsdb.retention如果此标志设置为默认值以外的任何值,则覆盖。
--storage.tsdb.retention.size:要保留的存储块的最大字节数。最旧的数据将首先被删除。默认为0或禁用。支持的单位:B、KB、MB、GB、TB、PB、EB。例如:“512MB”。基于 2 的幂,所以 1KB 是 1024B。尽管 WAL 和 m 映射的块被计入总大小,但仅删除持久块以兑现此保留。wal所以对磁盘的最低要求是(WAL 和 Checkpoint)和chunks_head(m-mapped Head chunks)目录组合占用的峰值空间(每 2 小时峰值)。
--storage.tsdb.retention: 不赞成使用storage.tsdb.retention.time。
--storage.tsdb.wal-compression:启用预写日志 (WAL) 的压缩。根据您的数据,您可以期望 WAL 大小减半而几乎没有额外的 cpu 负载。此标志在 2.11.0 中引入,并在 2.20.0 中默认启用。请注意,一旦启用,将 Prometheus 降级到 2.11.0 以下的版本将需要删除 WAL。
--query.timeout=2m: 最大查询超时时间
--query.max-concurrency=20:最大查询并发数
--web.read-timeout=5m:最大空闲超时时间
--web.max-connections=512:最大并发连接数
--web.enable-lifecycle:启动API动态加载配置功能

远端存储

Prometheus 的本地存储仅限于单个节点的可扩展性和持久性。Prometheus 本身并没有尝试解决集群存储问题,而是提供了一组允许与远程存储系统集成的接口。

概述

Prometheus 通过三种方式与远程存储系统集成:

  • Prometheus 可以将其摄取的样本以标准化格式写入远程 URL。
  • Prometheus 可以从其他 Prometheus 服务器以标准化格式接收样本。
  • Prometheus 可以以标准化格式从远程 URL 读取(返回)样本数据。

远程读写架构

读取和写入协议都使用基于 HTTP 的快速压缩协议缓冲区编码。这些协议还没有被认为是稳定的 API,将来可能会更改为使用 gRPC over HTTP/2,届时 Prometheus 和远程存储之间的所有跃点都可以安全地假定支持 HTTP/2。

--web.enable-remote-write-receiver可以通过设置命令行标志来启用内置的远程写入接收器。启用后,远程写入接收器端点为/api/v1/write.

配置文件

<remote_write>

# The URL of the endpoint to send samples to.
url: <string>

# Timeout for requests to the remote write endpoint.
[ remote_timeout: <duration> | default = 30s ]

# Custom HTTP headers to be sent along with each remote write request.
# Be aware that headers that are set by Prometheus itself can't be overwritten.
headers:
  [ <string>: <string> ... ]

# List of remote write relabel configurations.
write_relabel_configs:
  [ - <relabel_config> ... ]

# Name of the remote write config, which if specified must be unique among remote write configs.
# The name will be used in metrics and logging in place of a generated value to help users distinguish between
# remote write configs.
[ name: <string> ]

# Enables sending of exemplars over remote write. Note that exemplar storage itself must be enabled for exemplars to be scraped in the first place.
[ send_exemplars: <boolean> | default = false ]

# Sets the `Authorization` header on every remote write request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

# Optional `Authorization` header configuration.
authorization:
  # Sets the authentication type.
  [ type: <string> | default: Bearer ]
  # Sets the credentials. It is mutually exclusive with
  # `credentials_file`.
  [ credentials: <secret> ]
  # Sets the credentials to the credentials read from the configured file.
  # It is mutually exclusive with `credentials`.
  [ credentials_file: <filename> ]

# Optionally configures AWS's Signature Verification 4 signing process to
# sign requests. Cannot be set at the same time as basic_auth, authorization, or oauth2.
# To use the default credentials from the AWS SDK, use `sigv4: {}`.
sigv4:
  # The AWS region. If blank, the region from the default credentials chain
  # is used.
  [ region: <string> ]

  # The AWS API keys. If blank, the environment variables `AWS_ACCESS_KEY_ID`
  # and `AWS_SECRET_ACCESS_KEY` are used.
  [ access_key: <string> ]
  [ secret_key: <secret> ]

  # Named AWS profile used to authenticate.
  [ profile: <string> ]

  # AWS Role ARN, an alternative to using AWS API keys.
  [ role_arn: <string> ]

# Optional OAuth 2.0 configuration.
# Cannot be used at the same time as basic_auth, authorization, or sigv4.
oauth2:
  [ <oauth2> ]

# Configures the remote write request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]

# Configure whether HTTP requests follow HTTP 3xx redirects.
[ follow_redirects: <boolean> | default = true ]

# Configures the queue used to write to remote storage.
queue_config:
  # Number of samples to buffer per shard before we block reading of more
  # samples from the WAL. It is recommended to have enough capacity in each
  # shard to buffer several requests to keep throughput up while processing
  # occasional slow remote requests.
  [ capacity: <int> | default = 2500 ]
  # Maximum number of shards, i.e. amount of concurrency.
  [ max_shards: <int> | default = 200 ]
  # Minimum number of shards, i.e. amount of concurrency.
  [ min_shards: <int> | default = 1 ]
  # Maximum number of samples per send.
  [ max_samples_per_send: <int> | default = 500]
  # Maximum time a sample will wait in buffer.
  [ batch_send_deadline: <duration> | default = 5s ]
  # Initial retry delay. Gets doubled for every retry.
  [ min_backoff: <duration> | default = 30ms ]
  # Maximum retry delay.
  [ max_backoff: <duration> | default = 5s ]
  # Retry upon receiving a 429 status code from the remote-write storage.
  # This is experimental and might change in the future.
  [ retry_on_http_429: <boolean> | default = false ]

# Configures the sending of series metadata to remote storage.
# Metadata configuration is subject to change at any point
# or be removed in future releases.
metadata_config:
  # Whether metric metadata is sent to remote storage or not.
  [ send: <boolean> | default = true ]
  # How frequently metric metadata is sent to remote storage.
  [ send_interval: <duration> | default = 1m ]
  # Maximum number of samples per send.
  [ max_samples_per_send: <int> | default = 500]

 <remote_read>

# The URL of the endpoint to query from.
url: <string>

# Name of the remote read config, which if specified must be unique among remote read configs.
# The name will be used in metrics and logging in place of a generated value to help users distinguish between
# remote read configs.
[ name: <string> ]

# An optional list of equality matchers which have to be
# present in a selector to query the remote read endpoint.
required_matchers:
  [ <labelname>: <labelvalue> ... ]

# Timeout for requests to the remote read endpoint.
[ remote_timeout: <duration> | default = 1m ]

# Custom HTTP headers to be sent along with each remote read request.
# Be aware that headers that are set by Prometheus itself can't be overwritten.
headers:
  [ <string>: <string> ... ]

# Whether reads should be made for queries for time ranges that
# the local storage should have complete data for.
[ read_recent: <boolean> | default = false ]

# Sets the `Authorization` header on every remote read request with the
# configured username and password.
# password and password_file are mutually exclusive.
basic_auth:
  [ username: <string> ]
  [ password: <secret> ]
  [ password_file: <string> ]

# Optional `Authorization` header configuration.
authorization:
  # Sets the authentication type.
  [ type: <string> | default: Bearer ]
  # Sets the credentials. It is mutually exclusive with
  # `credentials_file`.
  [ credentials: <secret> ]
  # Sets the credentials to the credentials read from the configured file.
  # It is mutually exclusive with `credentials`.
  [ credentials_file: <filename> ]

# Optional OAuth 2.0 configuration.
# Cannot be used at the same time as basic_auth or authorization.
oauth2:
  [ <oauth2> ]

# Configures the remote read request's TLS settings.
tls_config:
  [ <tls_config> ]

# Optional proxy URL.
[ proxy_url: <string> ]

# Configure whether HTTP requests follow HTTP 3xx redirects.
[ follow_redirects: <boolean> | default = true ]

# Whether to use the external labels as selectors for the remote read endpoint.
[ filter_external_labels: <boolean> | default = true ]
posted @ 2025-04-03 15:06  小吉猫  阅读(349)  评论(0)    收藏  举报