[TSDB] OpenGemini 运维指南

OpenGemini 运维指南

概述 : OpenGemini 运维

Gemix : 官方部署运维一体化工具

启动集群 : gemix cluster start {geminiClusterName}

[root@vmw-b ~]# gemix cluster start gemini-test
Starting cluster gemini-test...
+ [ Serial ] - SSHKeySet: privateKey=/root/.gemix/storage/cluster/clusters/gemini-test/ssh/id_rsa, publicKey=/root/.gemix/storage/cluster/clusters/gemini-test/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=root, host=192.168.101.102
+ [Parallel] - UserSSH: user=root, host=192.168.101.103
+ [Parallel] - UserSSH: user=root, host=192.168.101.104
+ [Parallel] - UserSSH: user=root, host=192.168.101.102
+ [Parallel] - UserSSH: user=root, host=192.168.101.103
+ [Parallel] - UserSSH: user=root, host=192.168.101.104
+ [Parallel] - UserSSH: user=root, host=192.168.101.102
+ [Parallel] - UserSSH: user=root, host=192.168.101.103
+ [Parallel] - UserSSH: user=root, host=192.168.101.104
+ [Parallel] - UserSSH: user=root, host=192.168.101.105
+ [Parallel] - UserSSH: user=root, host=192.168.101.105
+ [ Serial ] - StartCluster
Starting component ts-meta
        Starting instance 192.168.101.104:8091
        Starting instance 192.168.101.102:8091
        Starting instance 192.168.101.103:8091
        Start instance 192.168.101.104:8091 success
        Start instance 192.168.101.103:8091 success
        Start instance 192.168.101.102:8091 success
Starting component ts-store
        Starting instance 192.168.101.104:8401
        Starting instance 192.168.101.102:8401
        Starting instance 192.168.101.103:8401
        Start instance 192.168.101.104:8401 success
        Start instance 192.168.101.103:8401 success
        Start instance 192.168.101.102:8401 success
Starting component ts-sql
        Starting instance 192.168.101.104:8086
        Starting instance 192.168.101.102:8086
        Starting instance 192.168.101.103:8086
        Start instance 192.168.101.104:8086 success
        Start instance 192.168.101.103:8086 success
        Start instance 192.168.101.102:8086 success
Starting component ts-server
        Starting instance 192.168.101.105:8186
        Start instance 192.168.101.105:8186 success
Starting component grafana
        Starting instance 192.168.101.105:3000
        Start instance 192.168.101.105:3000 success
Starting component ts-monitor
        Starting instance 192.168.101.103
        Starting instance 192.168.101.104
        Starting instance 192.168.101.105
        Starting instance 192.168.101.102
        Start 192.168.101.104 success
        Start 192.168.101.103 success
        Start 192.168.101.105 success
        Start 192.168.101.102 success
Started cluster `gemini-test` successfully

停运集群 : gemix cluster stop {geminiClusterName}

[root@vmw-b opengemini]# gemix cluster stop gemini-test
Will stop the cluster gemini-test with nodes: , roles: .
Do you want to continue? [y/N]:(default=N) y
+ [ Serial ] - SSHKeySet: privateKey=/root/.gemix/storage/cluster/clusters/gemini-test/ssh/id_rsa, publicKey=/root/.gemix/storage/cluster/clusters/gemini-test/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=root, host=vmw-b.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-c.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-d.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-b.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-c.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-d.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-b.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-c.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-d.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-e.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-e.servers.com
+ [ Serial ] - StopCluster
Stopping component grafana
  Stopping instance vmw-e.servers.com
  Stop grafana vmw-e.servers.com:3000 success
Stopping component ts-server
  Stopping instance vmw-e.servers.com
  Stop ts-server vmw-e.servers.com:8186 success
Stopping component ts-sql
  Stopping instance vmw-d.servers.com
  Stopping instance vmw-b.servers.com
  Stopping instance vmw-c.servers.com
  Stop ts-sql vmw-d.servers.com:8086 success
  Stop ts-sql vmw-c.servers.com:8086 success
  Stop ts-sql vmw-b.servers.com:8086 success
Stopping component ts-store
  Stopping instance vmw-d.servers.com
  Stopping instance vmw-b.servers.com
  Stopping instance vmw-c.servers.com
  Stop ts-store vmw-d.servers.com:8401 success
  Stop ts-store vmw-c.servers.com:8401 success
  Stop ts-store vmw-b.servers.com:8401 success
Stopping component ts-meta
  Stopping instance vmw-d.servers.com
  Stopping instance vmw-b.servers.com
  Stopping instance vmw-c.servers.com
  Stop ts-meta vmw-d.servers.com:8091 success
  Stop ts-meta vmw-c.servers.com:8091 success
  Stop ts-meta vmw-b.servers.com:8091 success
Stopping component ts-monitor
  Stopping instance vmw-e.servers.com
  Stopping instance vmw-b.servers.com
  Stopping instance vmw-c.servers.com
  Stopping instance vmw-d.servers.com
  Stop vmw-e.servers.com success
  Stop vmw-d.servers.com success
  Stop vmw-c.servers.com success
  Stop vmw-b.servers.com success
Stopped cluster `gemini-test` successfully
[root@vmw-b opengemini]# 

卸载集群 : gemix cluster uninstall

[root@vmw-b go-study]# gemix cluster uninstall gemini-test

  ██     ██  █████  ██████  ███    ██ ██ ███    ██  ██████
  ██     ██ ██   ██ ██   ██ ████   ██ ██ ████   ██ ██
  ██  █  ██ ███████ ██████  ██ ██  ██ ██ ██ ██  ██ ██   ███
  ██ ███ ██ ██   ██ ██   ██ ██  ██ ██ ██ ██  ██ ██ ██    ██
   ███ ███  ██   ██ ██   ██ ██   ████ ██ ██   ████  ██████

This operation will destroy openGemini v1.2.0 cluster gemini-test and its data.
Are you sure to continue?
(Type "Yes, I know my cluster and data will be deleted." to continue)
: Yes, I know my cluster and data will be deleted.
Destroying cluster...
+ [ Serial ] - SSHKeySet: privateKey=/root/.gemix/storage/cluster/clusters/gemini-test/ssh/id_rsa, publicKey=/root/.gemix/storage/cluster/clusters/gemini-test/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=root, host=vmw-b.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-c.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-d.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-b.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-c.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-d.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-b.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-c.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-d.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-e.servers.com
+ [Parallel] - UserSSH: user=root, host=vmw-e.servers.com
+ [ Serial ] - StopCluster
Stopping component grafana
        Stopping instance vmw-e.servers.com
        Stop grafana vmw-e.servers.com:3000 success
Stopping component ts-server
        Stopping instance vmw-e.servers.com
        Stop ts-server vmw-e.servers.com:8186 success
Stopping component ts-sql
        Stopping instance vmw-d.servers.com
        Stopping instance vmw-b.servers.com
        Stopping instance vmw-c.servers.com
        Stop ts-sql vmw-d.servers.com:8086 success
        Stop ts-sql vmw-c.servers.com:8086 success
        Stop ts-sql vmw-b.servers.com:8086 success
Stopping component ts-store
        Stopping instance vmw-d.servers.com
        Stopping instance vmw-b.servers.com
        Stopping instance vmw-c.servers.com
        Stop ts-store vmw-c.servers.com:8401 success
        Stop ts-store vmw-d.servers.com:8401 success
        Stop ts-store vmw-b.servers.com:8401 success
Stopping component ts-meta
        Stopping instance vmw-d.servers.com
        Stopping instance vmw-b.servers.com
        Stopping instance vmw-c.servers.com
        Stop ts-meta vmw-d.servers.com:8091 success
        Stop ts-meta vmw-c.servers.com:8091 success
        Stop ts-meta vmw-b.servers.com:8091 success
Stopping component ts-monitor
        Stopping instance vmw-b.servers.com
        Stopping instance vmw-c.servers.com
        Stopping instance vmw-d.servers.com
        Stopping instance vmw-e.servers.com
        Stop vmw-c.servers.com success
        Stop vmw-d.servers.com success
        Stop vmw-e.servers.com success
        Stop vmw-b.servers.com success
+ [ Serial ] - UninstallCluster
Destroying component grafana

        Destroying instance vmw-e.servers.com

Destroy vmw-e.servers.com finished

- Destroy grafana paths: [/usr/local/opengemini/gemini-deploy/grafana-3000 /etc/systemd/system/grafana-3000.service]

Destroying component ts-server

        Destroying instance vmw-e.servers.com

Destroy vmw-e.servers.com finished

- Destroy ts-server paths: [/usr/local/opengemini/gemini-deploy/ts-server-8186/data /usr/local/opengemini/gemini-log/logs/ts-server-8186 /usr/local/opengemini/gemini-deploy/ts-server-8186 /etc/systemd/system/ts-server-8186.service]

Uninstalling monitored vmw-e.servers.com
        Uninstalling instance vmw-e.servers.com
Uninstalling monitored on vmw-e.servers.com success
Destroying component ts-sql

        Destroying instance vmw-b.servers.com

Destroy vmw-b.servers.com finished

- Destroy ts-sql paths: [/usr/local/opengemini/gemini-log/logs/ts-sql-8086 /usr/local/opengemini/gemini-deploy/ts-sql-8086 /etc/systemd/system/ts-sql-8086.service]

        Destroying instance vmw-c.servers.com

Destroy vmw-c.servers.com finished

- Destroy ts-sql paths: [/usr/local/opengemini/gemini-log/logs/ts-sql-8086 /usr/local/opengemini/gemini-deploy/ts-sql-8086 /etc/systemd/system/ts-sql-8086.service]

        Destroying instance vmw-d.servers.com

Destroy vmw-d.servers.com finished

- Destroy ts-sql paths: [/usr/local/opengemini/gemini-log/logs/ts-sql-8086 /usr/local/opengemini/gemini-deploy/ts-sql-8086 /etc/systemd/system/ts-sql-8086.service]

Destroying component ts-store

        Destroying instance vmw-b.servers.com

Destroy vmw-b.servers.com finished

- Destroy ts-store paths: [/data/gemini-data/data /usr/local/opengemini/gemini-log/logs/ts-store-8401 /usr/local/opengemini/gemini-deploy/ts-store-8401 /etc/systemd/system/ts-store-8401.service]

        Destroying instance vmw-c.servers.com

Destroy vmw-c.servers.com finished

- Destroy ts-store paths: [/data/gemini-data/data /usr/local/opengemini/gemini-log/logs/ts-store-8401 /usr/local/opengemini/gemini-deploy/ts-store-8401 /etc/systemd/system/ts-store-8401.service]

        Destroying instance vmw-d.servers.com

Destroy vmw-d.servers.com finished

- Destroy ts-store paths: [/data/gemini-data/data /usr/local/opengemini/gemini-log/logs/ts-store-8401 /usr/local/opengemini/gemini-deploy/ts-store-8401 /etc/systemd/system/ts-store-8401.service]

Destroying component ts-meta

        Destroying instance vmw-b.servers.com

Destroy vmw-b.servers.com finished

- Destroy ts-meta paths: [/data/gemini-data/meta /usr/local/opengemini/gemini-log/logs/ts-meta-8091 /usr/local/opengemini/gemini-deploy/ts-meta-8091 /etc/systemd/system/ts-meta-8091.service]

        Destroying instance vmw-c.servers.com

Destroy vmw-c.servers.com finished

- Destroy ts-meta paths: [/etc/systemd/system/ts-meta-8091.service /data/gemini-data/meta /usr/local/opengemini/gemini-log/logs/ts-meta-8091 /usr/local/opengemini/gemini-deploy/ts-meta-8091]

        Destroying instance vmw-d.servers.com

Destroy vmw-d.servers.com finished

- Destroy ts-meta paths: [/data/gemini-data/meta /usr/local/opengemini/gemini-log/logs/ts-meta-8091 /usr/local/opengemini/gemini-deploy/ts-meta-8091 /etc/systemd/system/ts-meta-8091.service]

Uninstalling monitored vmw-b.servers.com
        Uninstalling instance vmw-b.servers.com
Uninstalling monitored on vmw-b.servers.com success
Uninstalling monitored vmw-c.servers.com
        Uninstalling instance vmw-c.servers.com
Uninstalling monitored on vmw-c.servers.com success
Uninstalling monitored vmw-d.servers.com
        Uninstalling instance vmw-d.servers.com
Uninstalling monitored on vmw-d.servers.com success
Clean global directories vmw-b.servers.com
        Clean directory /usr/local/opengemini/gemini-log/logs on instance vmw-b.servers.com
        Clean directory /usr/local/opengemini/gemini-deploy on instance vmw-b.servers.com
        Clean directory /home/root/data on instance vmw-b.servers.com
Clean global directories vmw-b.servers.com success
Clean global directories vmw-c.servers.com
        Clean directory /usr/local/opengemini/gemini-log/logs on instance vmw-c.servers.com
        Clean directory /usr/local/opengemini/gemini-deploy on instance vmw-c.servers.com
        Clean directory /home/root/data on instance vmw-c.servers.com
Clean global directories vmw-c.servers.com success
Clean global directories vmw-d.servers.com
        Clean directory /usr/local/opengemini/gemini-log/logs on instance vmw-d.servers.com
        Clean directory /usr/local/opengemini/gemini-deploy on instance vmw-d.servers.com
        Clean directory /home/root/data on instance vmw-d.servers.com
Clean global directories vmw-d.servers.com success
Clean global directories vmw-e.servers.com
        Clean directory /usr/local/opengemini/gemini-log/logs on instance vmw-e.servers.com
        Clean directory /usr/local/opengemini/gemini-deploy on instance vmw-e.servers.com
        Clean directory /home/root/data on instance vmw-e.servers.com
Clean global directories vmw-e.servers.com success
Delete public key vmw-b.servers.com
Delete public key vmw-b.servers.com success
Delete public key vmw-c.servers.com
Delete public key vmw-c.servers.com success
Delete public key vmw-d.servers.com
Delete public key vmw-d.servers.com success
Delete public key vmw-e.servers.com
Delete public key vmw-e.servers.com success
Uninstall cluster `gemini-test` successfully
  • 卸载前后的对比:

卸载前

[root@vmw-d ~]# ll /data/gemini-data/meta/
总用量 24
-rw-------. 1 root root 32768 12月 23 14:28 raft.db
drwxr-xr-x. 2 root root     6 12月 23 10:24 snapshots

卸载后

[root@vmw-d ~]# ll /data/gemini-data/meta/
ls: 无法访问/data/gemini-data/meta/: 没有那个文件或目录

[root@vmw-d ~]# ll /data/gemini-data/data
ls: 无法访问/data/gemini-data/data: 没有那个文件或目录

[root@vmw-d ~]# ll /usr/local/opengemini/gemini-log/
总用量 0
[root@vmw-d ~]# ll /usr/local/opengemini/
总用量 0
drwxr-xr-x. 2 root root 6 12月 23 14:29 gemini-log
  # 注: 即 opengemini/gemini-deploy/* 目录及子目录也被移除

帮助手册

[root@vmw-b go-study]# gemix cluster help
Deploy an openGemini cluster for production

Usage:
  gemix cluster [command]

Available Commands:
  template    Print topology template
  install     Install an openGemini cluster for production
  start       Start an openGemini cluster
  stop        Stop an openGemini cluster
  uninstall   Uninstall a specified cluster
  status      check cluster status
  upgrade     upgrade cluster

Flags:
  -h, --help   help for cluster

Use "gemix cluster [command] --help" for more information about a command.

ts-cli : 官方客户端

ts-cli : OpenGemini 原生客户端

  • OpenGemini

https://github.com/openGemini/openGemini/releases/download/v1.2.0/openGemini-1.2.0-linux-amd64.tar.gz 【推荐】
https://github.com/openGemini/openGemini/releases/download/v1.2.0/openGemini-1.2.0-windows-amd64.zip 【推荐】
已提前预编译好: ts-cli / ts-sql / ts-store / ...

git clone https://github.com/openGemini/openGemini.git
cd openGemini
python3 build.py --clean

单机版的构建结果:

> ls build
> ts-cli ts-meta ts-monitor ts-server ts-sql ts-store

ts-server 是独立运行版

  • [vmw-e] 自己安装 ts-cli
下载安装包 openGemini-1.2.0-linux-amd64.tar.gz

上传服务器 /root/openGemini-1.2.0-linux-amd64.tar.gz

# 废弃: /root/test
mv /root/test /root/test.bak

mkdir -p /root/gemini-test
tar -zxvf /root/openGemini-1.2.0-linux-amd64.tar.gz -C /root/gemini-test
	etc/monitor.conf
	etc/openGemini.conf
	etc/openGemini.singlenode.conf
	etc/weakpasswd.properties
	usr/bin/ts-cli
	usr/bin/ts-meta
	usr/bin/ts-monitor
	usr/bin/ts-server
	usr/bin/ts-sql
	usr/bin/ts-store
ls -la /root/gemini-test
chown -R root:root /root/gemini-test

rm /usr/local/bin/ts-cli
ln -s /root/gemini-test/usr/bin/ts-cli /usr/local/bin/ts-cli
# [x] ln -s /root/test/usr/bin/ts-cli /usr/local/bin/ts-cli

初始状态创建管理员账号

  • step1 启动openGemini单机或者集群
  • step2 创建管理员用户
  • 方式1 http curl 请求
> curl -i -XPOST "http://ip:8086/query" -k --insecure --data-urlencode "q=CREATE USER admin WITH PASSWORD 'admin-passwd' WITH ALL PRIVILEGES"
  • 方式2 通过ts-cli连接openGemini, 通过客户端创建
[root@vmw-e ~]# ts-cli --host 192.168.101.102 --port 8086
openGemini CLI 0.1.0 (rev-revision)
Please use `quit`, `exit` or `Ctrl-D` to exit this program.
> CREATE USER admin WITH PASSWORD 'Xdddwwp@ssword' WITH ALL PRIVILEGES;
> 
> show databases; //错误示范
ERR: unable to parse authentication credentials
> show users //错误示范
ERR: unable to parse authentication credentials
> auth //错误示范
username: password: 
ERR: authorization failed
> quit

[root@vmw-e ~]# ts-cli --host 192.168.101.102 --port 8086 --username admin --password "Xdddwwp@ssword"
openGemini CLI 0.1.0 (rev-revision)
Please use `quit`, `exit` or `Ctrl-D` to exit this program.
> show users
+-------+-------+--------+
| user  | admin | rwuser |
+-------+-------+--------+
| admin | true  | false  |
+-------+-------+--------+
3 columns, 1 rows in set
>
> create bdp_test;
> show databases;
name: databases
+----------+
|   name   |
+----------+
| bdp_test |
+----------+
1 columns, 1 rows in set

【注意事项】

  • 执行时,需要将命令中的IP地址和端口替换为实际环境中的ts-sql的ip和port,同时设置管理员账号名和密码
  • 密码必须由大小写字母、数字、特殊字符组成,长度限制为8-256位字符
  • config/weakpasswd.properties为若密码配置文件,默认支持若密码校验,如果设置的密码与配置文件中的密码一致,则视为若密码,不允许使用。
  • 密码字符串必须用单引号引起来,验证请求时,请包含单引号。
  • 不建议在密码中使用单引号(‘)和反斜杠(\)字符,对于包含这些字符\’的密码,在创建密码和提交身份验证请求时,请使用反斜杠对特殊字符进行转义

出于安全考虑,openGemini的管理员账号在系统内只能创建一次不能删除,并且不能重命名
创建管理员账户前,请认真考虑用户名和密码,并做好账号和密码的保存。

ts-cli --host 192.168.101.102 --port 8086 --password "OpenGemini#1658"

  • 参考文献
  • [vmw-e] 使用
# 帮助手册
[root@vmw-e ~]# ts-cli -h
ts-cli is a CLI tool of openGemini.

Usage:
  ts-cli [flags]
  ts-cli [command]

Available Commands:
  execute     Execute a query
  help        Help about any command
  import      Import data to openGemini
  interactive Work in interactive mode
  version     Display the version of openGemini CLI

Flags:
      --database string    Database to connect to openGemini.
  -h, --help               help for ts-cli
      --host string        ts-sql host to connect to. (default "localhost")
  -p, --password string    Password to connect to openGemini.
      --port int           ts-sql tcp port to connect to. (default 8086)
      --precision string   Specify the format of the timestamp: rfc3339, h, m, s, ms, u or ns. (default "ns")
      --socket string      openGemini unix domain socket to connect to.
      --ssl                Use https for connecting to openGemini.
      --unsafeSsl          Ignore ssl verification when connecting openGemini by https. (default true)
  -u, --username string    Username to connect to openGemini.

Use "ts-cli [command] --help" for more information about a command.


[root@vmw-e ~]# ts-cli -host 192.168.101.102 -port 8086

监控体系

整体技术方案

  • openGemini提供了260+监控指标,来监控集群的各种状态,下面将介绍如何开启监控项,并结合Grafana可视化监控数据。

监控系统的整体部署方案如下图所示:

该方案包含监控数据生产、采集、存储、分析告警和展示等所有功能,主要由四部分组成:

  • openGemini集群
    随着业务运行,openGemini持续输出内核运行状态的各项指标数据。openGemini同时支持两种方式输出指标数据:
  • 第一种将指标数据输出到日志中;
  • 第二种则为HTTP方式,采用openGemini的数据格式,接收端需使用InfluxDB或openGemini这两种数据库均可。
  • 指标采集
    如上所述,采用HTTP方式输出指标数据,则无需额外的数据采集工具,但会缺乏一些监控指标,如磁盘利用率、创建的表总数、时间线数量、创建的数据库总数等。

如果将数据输出到日志中时,则需要使用ts-monitor进行指标数据采集,除内核运行状态指标数据之外,ts-monitor工具还将采集如磁盘利用率、创建的表总数、时间线数量、创建的数据库总数等指标。
ts-monitor同样将指标数据转换为openGemini的数据格式进行上报。

  • 数据存储
    考虑到监控系统频繁的查询操作,长期来看,为避免对业务集群的运行资源造成竞争,从而影响业务效率,因此建议将指标数据转存到专门的存储节点。openGemini提供了单机和集群两种版本,通常对于集群自身的指标数据保存,单机性能已然足够。与此同时,openGemini同样支持Grafana,且单机性能更优于InfluxDB,建议直接使用openGemini单机版部署,用于存储监控指标数据。

  • 数据可视化与告警

Grafana是业界非常普遍使用的一款开源数据可视化工具,可以做数据监控数据统计,带有告警功能。选择它用在监控系统中最合适不过。

综上所述,该方案的优点是部署简单、易获取(所有组件开源)。接下来将重点介绍不同数据采集方式对应的部署和配置

方式一:使用ts-monitor从业务日志中采集监控指标数据

如图所示:

  • 监控数据log files方式输出,每个节点(node)上部署一个ts-monitor,用于采集该节点上所有openGemini组件的监控指标数据,
  • 然后将数据写入到远端监控节点上的openGemini中,
  • 最后使用Grafana作为监控/告警面板来展示监控数据。
  • 当前部署方式共涉及两个配置文件:

openGemini.conf 或 openGemini.single.conf
monitor.conf

  • openGemini.conf主要配置[monitor],必要配置项如下表所示
配置项 说明
pushers 数据输出方式,有http和file两种,当前这种方式下,选择file
store-enabled 设置为true,打开监控。关闭则设置为false
store-interval 指标采集周期
store-path 当前方式下,指标数据会写入到文件,这里指指标数据文件的保存路径
  • 配置示例

当前配置会将openGemini各组件的监控日志每10秒写一次到/tmp/openGemini/metric/metric.data中。

[monitor]
  pushers = "file"
  store-enabled = true
  store-database = "monitor"
  store-interval = "10s"
  store-path = "/tmp/openGemini/metric/"
  compress = false
  # http-endpoint = ""
  # username = ""
  # password = ""

如果是运行openGemini单机版,需要将上述配置添加到openGemini.single.conf中。

  • 接下来介绍ts-monitor组件的配置文件monitor.conf如何配置。
[monitor]
  # [必填]当前节点的ip地址或者可以唯一标识该节点的信息,比如hostname
  host = "{{addr}}"
  # [必填]ts-monitor采集指标数据的目录,应与openGemini.conf -> [monitor] -> store-path配置保持一致
  metric-path = "/tmp/openGemini/metric/"
  # [必填]ts-monitor会采集集群的错误日志。应与openGemini.conf -> [logging] -> path配置保持一致
  error-log-path = "/tmp/openGemini/logs"
  # 如果当前节点没有部署ts-store组件,可注释,不配置
  # 该配置项与计算磁盘空间利用率有关,应与openGemini.conf -> [data] -> store-data-dir 配置保持一致
  disk-path = "/tmp/openGemini/data"
  # 如果当前节点没有部署ts-store组件,可注释,不填写
  # 出于安全考虑,建议把openGemini的WAL单独存储一个磁盘分区,这样数据盘和WAL互相不受影响。
  # 该配置项与计算WAL的磁盘空间利用率有关,应与openGemini.conf -> [data] -> store-wal-dir 配置保持一致
  aux-disk-path = "/tmp/openGemini/data/wal"
  # [必填]当前节点上已部署openGemini的组件,用逗号分隔。该配置项与监控进程状态有关
  process = "ts-store,ts-sql,ts-meta"
  # [可保持默认配置]保存当前已经采集过的metric.data文件信息和文件内的位置,防止ts-monitor重启后重复采集数据
  history-file = "history.txt"
  # Is the metric compressed.
  compress = false
  
[query]
  # 这部分配置,如果多个节点上都部署有ts-monitor,则在其中一个ts-monitor开启如下配置即可
  # 设置为true,表示ts-monitor可以定期向业务集群发送```SHOW DATABASES, SHOW MEASUREMENTS```等查询命令,统计创建的DB和表的数量
  query-enable = false
  # 业务集群中ts-sql的监听地址和端口
  http-endpoint = "{{query_addr}}:8086"
  # ts-monitor向openGemini业务集群发送查询命令的时间间隔
  query-interval = "5m"
  # 如果业务集群开启了https和用户密码鉴权,ts-monitor将采用https方式连接业务集群,连接过程需要携带用户名密码登陆
  # 这里有暴露业务集群用户名/密码的风险,存在安全风险
  # username = ""
  # password = ""
  # https-enable = false
  
[report]
  # ts-monitor采集的指标数据将会上报到单独的节点进行存储,这里将配置目标节点上的数据库(openGemini)地址、db、用户密码、保留策略等信息
  # 目标节点上openGemini的监听地址和端口
  address = "{{report_addr}}:8086"
  # 数据写入的db名称
  database = "monitor"
  # 如果目标节点上的openGemini开启了https和用户密码鉴权,ts-monitor将采用https方式连接,连接过程需要携带用户名密码登陆
  # username = ""
  # password = ""
  # https-enable = false
  # 数据保留策略名称和数据保留时长,可默认. 表示监控数据将在168小时后被自动删除
  rp = "autogen"
  rp-duration = "168h"

方式二:监控指标数据直接push到监控节点

如图所示,监控数据直接push到监控节点的方式不需要在业务集群中部署ts-monitor,但这种情况下要求业务集群能够直连监控节点并且会缺乏一些监控指标,如集群节点的CPU利用率、磁盘利用率、创建的表总数、时间线数量、创建的数据库总数等。

该方式下,只需要配置openGemini.conf就好,必要配置项如下表所示。

配置项 说明
pushers 数据输出方式,有http和file两种,当前这种方式下,选择http
store-enabled 设置为true,打开监控。关闭则设置为false
store-database 监控数据需要写入到监控节点上单机openGemini的数据库名,需提前创建相关数据库
store-interval 指标采集周期
http-endpoint 监控节点上部署的单机openGemini的监听地址和端口
  • 配置示例
    当前配置会将监控数据每隔10秒写一次到监控节点“192.70.3.43:8086”的“monitor”数据库中
[monitor]
  pushers = "http"
  store-enabled = true
  store-database = "monitor"
  store-interval = "10s"
  # store-path = "/tmp/openGemini/metric/ts-sql/metric.data"
  # compress = false
  http-endpoint = "192.70.3.43:8086"
  # 如果192.70.3.43节点上的openGemini开启了https和用户密码鉴权,ts-monitor将采用https方式连接,连接过程需要携带用户名密码登陆
  # 这里可能会暴露用户名和密码,存在一定安全风险
  # username = ""
  # password = ""
  # http-endpoint = ""
  • Grafana配置
    Grafana安装过程略,启动Grafana后,通过浏览器访问http://192.70.3.43:3000 ,添加数据源选择InfluxDB(openGemini兼容InfluxDB)

进入创建数据源界面,其中name填写为新创建的数据源的名称,URL为监控节点上openGemini的地址和端口,database为监控数据所在数据库名称

数据源创建完成后可以在Grafana中新建看板来完成监控用户感兴趣的内容,如下图所示,建立一个Panel,选择Data source为刚刚建立的monitor,然后通过图形化查询选择界面来建立看板的查询语句

除了使用图形化界面查询,也可以点击“Query Inspector”来使用直接输入查询语句的方式建立Panel。如下图所示,先选择数据源为刚才建立的“monitor”,然后可以通过如下查询语句来查询数据库中cpu的平均使用率

SELECT 
    mean("CpuUsage") 
FROM  $database.."system" 
WHERE $timeFilter 
GROUP BY time($__interval), "host" fill(null)

除了“CpuUsage”监控数据还包括其他的一些指标用来建立监控面板,例如跟集群整体健康状态相关的一些查询如下表所示

# 查询CPU利用率
SELECT mean("CpuUsage") 
FROM $database.."system" 
WHERE $timeFilter 
GROUP BY time($__interval), "host" fill(null)

# 查询内存使用量
SELECT max("MemUsage") 
FROM $database.."system" 
WHERE $timeFilter 
GROUP BY time($__interval), "host" fill(null)

# 查询磁盘空间利用率
SELECT max("DiskUsage") 
FROM $database.."system" 
WHERE $timeFilter 
GROUP BY time($__interval), "host" fill(null)

# 查看进程状态
SELECT last("StoreStatus") as "store", 
       last("SqlStatus") as "sql",
       last("MetaStatus") as "meta" 
FROM $database.."system" 
WHERE $timeFilter 
GROUP BY "host"

# ts-meta 主备检查
SELECT last("Status") as "status" 
FROM $database.."metaRaft" 
WHERE $timeFilter 
GROUP BY "hostname"

# store 进程健康检查
SELECT 
    last("Status") as "status" 
FROM $database.."meta" 
WHERE $timeFilter 
GROUP BY "Host"
  • 效果展示

Grafana

访问URL

  • Opengemini 通过 Gemix 一键安装完成后,即可访问 Grafana

http://{grafana_servers}:3000
默认用户名密码: admin / admin

进程 / 部署路径

[root@vmw-e ~]# ps -ef | grep -i grafana
root       2528      1  0 19:48 ?        00:00:07 bin/grafana-7.5.17/bin/grafana-server --homepath=/usr/local/opengemini/gemini-deploy/grafana-3000/bin/grafana-7.5.17 --config=/usr/local/opengemini/gemini-deploy/grafana-3000/conf/grafana.ini
root      89035  88722  0 21:01 pts/2    00:00:00 grep --color=auto -i grafana

[root@vmw-e ~]# pwdx 2528
2528: /usr/local/opengemini/gemini-deploy/grafana-3000

核心配置 : grafana.ini

vim /usr/local/opengemini/gemini-deploy/grafana-3000/conf/grafana.ini

# #################### Grafana Configuration Example #####################
# 
# Everything has defaults so you only need to uncomment things you want to
# change
# possible values : production, development
; app_mode = production
# instance name, defaults to HOSTNAME environment variable value or hostname if HOSTNAME var is empty
; instance_name = ${HOSTNAME}
# ################################### Paths ####################################
[paths]
# Path to where grafana can store temp files, sessions, and the sqlite3 db (if that is used)
# 
data         = /usr/local/opengemini/gemini-deploy/grafana-3000/data
# 
# Directory where grafana can store logs
# 
logs         = /usr/local/opengemini/gemini-deploy/grafana-3000/log
# 
# Directory where grafana will automatically scan and look for plugins
# 
plugins      = /usr/local/opengemini/gemini-deploy/grafana-3000/plugins
# 
# 
# folder that contains provisioning config files that grafana will apply on startup and while running.
provisioning = /usr/local/opengemini/gemini-deploy/grafana-3000/provisioning

# 
# ################################### Server ####################################
[server]
# Protocol (http or https)
; protocol = http
# The ip address to bind to, empty will bind to all interfaces
; http_addr =
# The http port  to use
http_port = 3000
# The public facing domain name used to access grafana from a browser
domain    = 192.168.101.105
# Redirect to correct domain if host header does not match domain
# Prevents DNS rebinding attacks
; enforce_domain = false
# The full public facing url
# Log web requests
; router_logging = false
# the path relative working path
; static_root_path = public
# enable gzip
; enable_gzip = false
# https certs & key file
; cert_file =
; cert_key =
# ################################### Database ####################################
[database]

# Either "mysql", "postgres" or "sqlite3", it's your choice
; type = sqlite3
; host = 127.0.0.1:3306
; name = grafana
; user = root
; password =
# For "postgres" only, either "disable", "require" or "verify-full"
; ssl_mode = disable
# For "sqlite3" only, path relative to data_path setting
; path = grafana.db
# ################################### Session ####################################
[session]

# Either "memory", "file", "redis", "mysql", "postgres", default is "file"
; provider = file
# Provider config options
# memory: not have any config yet
# file: session dir path, is relative to grafana data_path
# redis: config like redis server e.g. `addr=127.0.0.1:6379,pool_size=100,db=grafana`
# mysql: go-sql-driver/mysql dsn config string, e.g. `user:password@tcp(127.0.0.1:3306)/database_name`
# postgres: user=a password=b host=localhost port=5432 dbname=c sslmode=disable
; provider_config = sessions
# Session cookie name
; cookie_name = grafana_sess
# If you use session in https only, default is false
; cookie_secure = false
# Session life time, default is 86400
; session_life_time = 86400
# ################################### Analytics ####################################
[analytics]
# Server reporting, sends usage counters to stats.grafana.org every 24 hours.
# No ip addresses are being tracked, only simple counters to track
# running instances, dashboard and error counts. It is very helpful to us.
# Change this option to false to disable reporting.
; reporting_enabled = true
# Set to false to disable all checks to https://grafana.net
# for new vesions (grafana itself and plugins), check is used
# in some UI views to notify that grafana or plugin update exists
# This option does not cause any auto updates, nor send any information
# only a GET request to http://grafana.net to get latest versions
check_for_updates = true

# Google Analytics universal tracking code, only enabled if you specify an id here
; google_analytics_ua_id =
# ################################### Security ####################################
[security]
# default admin user, created on startup
admin_user     = admin
# default admin password, can be changed before first start of grafana,  or in profile settings
admin_password = admin

# used for signing
; secret_key = SW2YcwTIb9zpOOhoPsMm
# Auto-login remember days
; login_remember_days = 7
; cookie_username = grafana_user
; cookie_remember_name = grafana_remember
# disable gravatar profile images
; disable_gravatar = false
# data source proxy whitelist (ip_or_domain:port separated by spaces)
; data_source_proxy_whitelist =
[snapshots]

# snapshot sharing options
; external_enabled = true
; external_snapshot_url = https://snapshots-origin.raintank.io
; external_snapshot_name = Publish to snapshot.raintank.io
# ################################### Users ####################################
[users]

# disable user signup / registration
; allow_sign_up = true
# Allow non admin users to create organizations
; allow_org_create = true
# Set to true to automatically assign new users to the default organization (id 1)
; auto_assign_org = true
# Default role new users will be automatically assigned (if disabled above is set to true)
; auto_assign_org_role = Viewer
# Background text for the user field on the login page
; login_hint = email or username
# Default UI theme ("dark" or "light")
; default_theme = dark
# ############## Set Cookie Name for Multiple Instances #######################
[auth]
login_cookie_name = grafana_session_3000

# ################################### Anonymous Auth ##########################
[auth.anonymous]

# specify organization name that should be used for unauthenticated users
; org_name = Main Org.
# specify role for unauthenticated users
; org_role = Viewer
# ################################### Basic Auth ##########################
[auth.basic]

; enabled = true
# ################################### Auth LDAP ##########################
[auth.ldap]

; enabled = false
; config_file = /etc/grafana/ldap.toml
# ################################### SMTP / Emailing ##########################
[smtp]

; enabled = false
; host = localhost:25
; user =
; password =
; cert_file =
; key_file =
; skip_verify = false
; from_address = admin@grafana.localhost
[emails]

; welcome_email_on_sign_up = false
# ################################### Logging ##########################
[log]
# Either "console", "file", "syslog". Default is console and  file
# Use space to separate multiple modes, e.g. "console file"
mode = file

# Either "trace", "debug", "info", "warn", "error", "critical", default is "info"
; level = info
# For "console" mode only
[log.console]

; level =
# log line format, valid options are text, console and json
; format = console
# For "file" mode only
[log.file]
level  = info
# log line format, valid options are text, console and json
format = text

# This enables automated log rotate(switch of following options), default is true
; log_rotate = true
# Max line number of single file, default is 1000000
; max_lines = 1000000
# Max size shift of single file, default is 28 means 1 << 28, 256MB
; max_size_shift = 28
# Segment log daily, default is true
; daily_rotate = true
# Expired days of log file(delete after max days), default is 7
; max_days = 7
[log.syslog]
; level =
# log line format, valid options are text, console and json
; format = text
# Syslog network type and address. This can be udp, tcp, or unix. If left blank, the default unix endpoints will be used.
; network =
; address =
# Syslog facility. user, daemon and local0 through local7 are valid.
; facility =
# Syslog tag. By default, the process' argv[0] is used.
; tag =
# ################################### AMQP Event Publisher ##########################
[event_publisher]

; enabled = false
; rabbitmq_url = amqp://localhost/
; exchange = grafana_events
; #################################### Dashboard JSON files ##########################
[dashboards.json]
enabled = false
path    = /usr/local/opengemini/gemini-deploy/grafana-3000/dashboards

# ################################### Internal Grafana Metrics ##########################
# Metrics available at HTTP API Url /api/metrics
[metrics]

# Disable / Enable internal metrics
; enabled           = true
# Publish interval
; interval_seconds  = 10
# Send internal metrics to Graphite
; [metrics.graphite]
; address = localhost:2003
; prefix = prod.grafana.%(instance_name)s.
# ################################### Internal Grafana Metrics ##########################
# Url used to to import dashboards directly from Grafana.net
[grafana_net]
url = https://grafana.net

Dasshboard : Overview

Dasshboard : Process Restarted

Dashboard : Inspection

Dashboard: Error Logs

Dashboard: Performance Write

Dashboard: Performance Read

ts-monitor

进程 / 部署路径

[root@vmw-b ~]# ps -ef | grep -i ts-monitor
root       2957      1  0 19:50 ?        00:00:24 bin/ts-monitor --config=conf/ts-monitor.toml
root     112060   2138  0 21:10 pts/0    00:00:00 grep --color=auto -i ts-monitor
[root@vmw-b ~]# pwdx 2957
2957: /usr/local/opengemini/gemini-deploy/ts-monitor

核心配置 : ts-monitor.toml

# vim /usr/local/opengemini/gemini-deploy/ts-monitor/conf/ts-monitor.toml
[monitor]
  # localhost ip
  host = "192.168.101.102"
  # Indicates the path of the metric file generated by the kernel. References openGemini.conf: [monitor] store-path
  # metric-path cannot have subdirectories
  metric-path = "/usr/local/opengemini/gemini-log/logs/ts-meta-8091/metric"
  # Indicates the path of the log file generated by the kernel. References openGemini.conf: [logging] path
  # error-log-path cannot have subdirectories
  error-log-path = "/usr/local/opengemini/gemini-log/logs/ts-meta-8091"
  # Data disk path. References openGemini.conf: [data] store-data-dir
  disk-path = "/data/gemini-data/data"
  # Wal disk path. References openGemini.conf: [data]  store-wal-dir
  aux-disk-path = "/data/gemini-data/data/wal"
  # Name of the process to be monitored. Optional Value: ts-store,ts-sql,ts-meta.
  # Determined based on the actual process running on the local node.
  process = "ts-store,ts-sql"
  # the history file reported error-log names.
  history-file = "history.txt"
  # Is the metric compressed.
  compress = false

[query]
  # query for some DDL. Report for these data to monitor cluster.
  # - SHOW DATABASES
  # - SHOW MEASUREMENTS
  # - SHOW SERIES CARDINALITY FROM mst
  query-enable = false
  http-endpoint = "127.0.0.x:8086"
  query-interval = "5m"
  # username = ""
  # password = ""
  # https-enable = false

[report]
  # Address for metric data to be reported.
  address = ""
  # Database name for metric data to be reported.
  database = "gemini_test"
  rp = "autogen"
  rp-duration = "168h"
  # username = ""
  # password = ""

[logging]
  format = "auto"
  level = "info"
  path = "/usr/local/opengemini/gemini-log/logs/ts-monitor"
  max-size = "64m"
  max-num = 30
  max-age = 7
  compress-enabled = true

场景:数据节点启用ts-monitor,上报监控数据到 OpenGemini 监控库

  • step0 安装一个 opengemini 数据库实例,并创建监控库_monitor

最佳实践:单独创建,与 opengemini 正式业务库 独立开,避免产生影响。

此处,使用 Gemix 一键安装后的 ts-server 单机版(192.168.101.105:8186) 节点作为上报地址。

create database _monitor;
  • step1 节点上编辑ts-monitor的配置文件

原配置

[root@vmw-b ~]# vim /usr/local/opengemini/gemini-deploy/ts-monitor/conf/ts-monitor.toml
...

[report]
  # Address for metric data to be reported.
  address = ""
  # Database name for metric data to be reported.
  database = "gemini_test"
  rp = "autogen"
  rp-duration = "168h"
  # username = ""
  # password = ""

...

新配置

[root@vmw-b ~]# vim /usr/local/opengemini/gemini-deploy/ts-monitor/conf/ts-monitor.toml
...

[report]
  # Address for metric data to be reported.
  # address = "192.168.101.102:8086" : 即 向正式库上报监控数据,不建议
  # address = "192.168.101.105:8186" : 即 向单机版OpenGemini(ts-server)上报监控数据,建议
  address = "192.168.101.105:8186"
# Database name for metric data to be reported.
  # database = "gemini_test"
  database = "_monitor"
  rp = "autogen"
  rp-duration = "168h"
  # username = ""
  # password = ""

...
  • step3 重启ts-monitor进程

  • step4 登录 opengemini 后,可发现 _monitor 库下有新增的 measurement

# 查看保留策略
# show retention policies on _monitor
name=autogen shardGroupDuration=168h0m0s ...

# show measurements on _monitor
err_log
meta
metaRaft
spdy
system

# select * from "_monitor"."autogen"."err_log"

err_log

system


meta

ts-server : 单机版 OpenGemini (监控存储库)

  • 单机版本(ts-server)

多用于作为监控库
不支持数据副本;当前为DB级数据副本,不支持表级数据副本

核心配置: ts-server.toml

[root@vmw-e ~]# vim /usr/local/opengemini/gemini-deploy/ts-server-8186/conf/ts-server.toml
# WARNING: This file is auto-generated. Do not edit! All your modification will be overwritten!
# You can use 'gemix cluster edit-config' and 'gemix cluster reload' to update the configuration
# All configuration items you want to change can be added to:
# server_configs:
#   ts-server:
#     aa.b1.c3: value
#     aa.b2.c4: value
[common]
meta-join = ["192.168.101.105:8192"]

[data]
store-data-dir = "/usr/local/opengemini/gemini-deploy/ts-server-8186/data"
store-ingest-addr = "192.168.101.105:8410"
store-select-addr = "192.168.101.105:8411"
store-wal-dir = "/usr/local/opengemini/gemini-deploy/ts-server-8186/data"

[http]
bind-address = "192.168.101.105:8186"

[logging]
path = "/usr/local/opengemini/gemini-log/logs/ts-server-8186"

[meta]
bind-address = "192.168.101.105:8188"
dir = "/usr/local/opengemini/gemini-deploy/ts-server-8186/data/meta"
http-bind-address = "192.168.101.105:8191"
rpc-bind-address = "192.168.101.105:8192"

[monitor]
pushers = "file"
store-enabled = false
store-path = "/usr/local/opengemini/gemini-log/logs/ts-server-8186/metric/server-metric.data"

X 参考文献

posted @ 2024-12-27 01:07  千千寰宇  阅读(209)  评论(0)    收藏  举报