[INFLUXDB/OpenGemini] 查询数据时,TSDB数据库报“InfluxDBException: user is locked”

1 问题描述

  • 通过Query API查询INFLUXDB数据库数据时,查询失败,日志中报INFLUXDB数据库错误:
...
org.influxdb.InfluxDBException: user is locked
	at org.influxdb.InfluxDBException.buildExceptionFromErrorMessage(InfluxDBException.java:161) ~[influxdb-java-2.22.jar!/:?]
	at org.influxdb.InfluxDBException.buildExceptionForErrorState(InfluxDBException.java:173) ~[influxdb-java-2.22.jar!/:?]
	at org.influxdb.impl.InfluxDBImpl.execute(InfluxDBImpl.java:846) ~[influxdb-java-2.22.jar!/:?]
	at org.influxdb.impl.InfluxDBImpl.executeQuery(InfluxDBImpl.java:833) ~[influxdb-java-2.22.jar!/:?]
	at org.influxdb.impl.InfluxDBImpl.query$original$NqPAZts7(InfluxDBImpl.java:559) ~[influxdb-java-2.22.jar!/:?]
	at org.influxdb.impl.InfluxDBImpl.query$original$NqPAZts7$accessor$6GQc4J6p(InfluxDBImpl.java) ~[influxdb-java-2.22.jar!/:?]
	at org.influxdb.impl.InfluxDBImpl$auxiliary$JDCxBf2K.call(Unknown Source) ~[influxdb-java-2.22.jar!/:?]
	at org.apache.skywalking.apm.agent.core.plugin.interceptor.enhance.InstMethodsInter.intercept(InstMethodsInter.java:86) ~[skywalking-agent.jar:8.9.0]
	at org.influxdb.impl.InfluxDBImpl.query(InfluxDBImpl.java) ~[influxdb-java-2.22.jar!/:?]
...

或者:

# curl -v http://127.0.0.1:8086/query -u Xxxuser:'xxxx' --data-urlencode "q=show users"
{"error":"user is locked"}
# curl -v http://127.0.0.1:8086/query -u Xxxuser:'xxxx' --data-urlencode "q=SET PASSWORD FOR otherUser = 'xxxx';"
{"error":"user is locked"}

2 问题分析

由于 INFLUXDB / OpenGemini 在此问题上的现象、数据库代码逻辑均一致,以便合并分析、并简称 TSDB(时序数据库)。

  • 根本原因:下午升级时,负责写入数据的Flink作业依赖的NACOS配置文件中密码配置错误,且用该错误密码高频请求导致了数据库用户被锁。

3 解决方法

Step1 停止错误密码请求的程序

  • 方法1 停止运行INFLUXDB的数据写入程序

  • 方法2 将TSDB数据库集群对外提供数据读写的进程的所在服务器启动防火墙,并仅允许 TSDB 集群内的 IP 访问本机器

在 OpenGemini 数据库中,特指 ts-sql 组件/进程

  • 启动防火墙
# 启动防火墙
systemctl status firewalld
systemctl start firewalld
systemctl status firewalld

# 仅允许 TSDB 数据库集群内的 IP 访问 (以暂时快速屏蔽外部的错误请求)
firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='xx.xx.xx.01' accept"
firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='xx.xx.xx.02' accept"
firewall-cmd --permanent --add-rich-rule="rule family='ipv4' source address='xx.xx.xx.03' accept"

# 重新载入防火墙设置,使设置生效
firewall-cmd --reload

# 查看防火墙的状态、已设置的规则
firewall-cmd --state
firewall-cmd --zone=public --list-rich-rules
firewall-cmd --list-all

# 在其他节点上访问 ts-sql 所在机器,以验证防火墙的有效性
curl -v http://xx.xx.xx.xx:8086/ping
  • 修改密码(OpenGemini 的 可选步骤)
# 临时关闭 ts-sql 组件的身份认证 
vim /usr/local/opengemini/gemini-deploy/ts-sql-8086/conf/ts-sql.toml
[http]
auth-enabled = false

# 修改密码(方法1)
curl -v http://127.0.0.1:8086/query -u Xxxuser:'xxxx' --data-urlencode "q=SET PASSWORD FOR otherUser = 'xxxx';" 【X 错误示范】
curl -v http://127.0.0.1:8086/query --data-urlencode "q=SET PASSWORD FOR otherUser = 'xxxx';" 【√ 正确示范】

# 修改密码(方法2)
/root/go/src/opengemini/build/ts-cli --database monitor --host xx.xx.xx.xx --port 8086 -u admin --password {password} 【X 错误示范】
/root/go/src/opengemini/build/ts-cli --database monitor --host xx.xx.xx.xx --port 8086 【√ 正确示范】
> SET PASSWORD FOR admin = 'xxxxxx'

# 再次恢复 ts-sql 组件的身份认证 
vim /usr/local/opengemini/gemini-deploy/ts-sql-8086/conf/ts-sql.toml
[http]
auth-enabled = true

# 验证新密码是否有效
curl -G http://10.37.19.242:8086/query -u admin:'RK0Ym4&U94' --data-urlencode "q=show users"
  • 关闭防火墙
systemctl stop firewalld

Step2 确保所有错误密码请求的程序停止运行后,静待30秒钟,将自动解锁

停止之后,数据库30s默认自动解锁(否则,即使是尝试重置密码,也可能一直无法重置密码成功)

4 总结与关联操作

TS-SQL异常情况的观测方法

  • 观测 ts-sql 错误密码请求的实时日志
tail -100f /usr/local/opengemini/gemini-log/ts-sql-8086/sql.error.log

创建新用户

# /root/go/src/opengemini/build/ts-cli --database monitor --host xx.xx.xx.xx --port 8086 -u admin --password {password}
CREATE USER rwuser WITH PASSWORD 'xxxx';
SHOW USERS;
GRANT ALL ON xxx_db TO rwuser;
SHOW GRANTS FOR rwuser;

需同步修改 ts-monitor 组件的密码(OpenGemini 可选)

如果 OpenGemini 数据库 ts-monitor 使用的 TSDB 就是出问题的 TSDB 数据库集群实例本身(而非单独新建的 OpenGemini 实例)时,需同步修改 ts-monitor 中的 TSDB 连接配置。

vim /usr/local/gemini-deploy/ts-monitor/conf/ts-monitor.toml
[query]
  # query for some DDL. Report for these data to monitor cluster.
  # - SHOW DATABASES
  # - SHOW MEASUREMENTS
  # - SHOW SERIES CARDINALITY FROM mst
  query-enable = true
  http-endpoint = "xx.xx.xx.xx:8086"
  query-interval = "5m"
  username = "XxxUser"
  password = "XxxPassword"
  # https-enable = false

[report]
  # Address for metric data to be reported.
  address = "xx.xx.xx.xx:8086"
  # Database name for metric data to be reported.
  database = "monitor_xxx"
  rp = "autogen"
  rp-duration = "168h"
  username = "XxxUser"
  password = "XxxPassword"

TSDB数据库集群给用户加锁/解锁的逻辑

OpenGemini v1.2.0 为例

  • lib/util/lifted/influx/meta/errors.go

lib/util/lifted/influx/meta/errors.go:187: ErrUserLocked = errors.New("user is locked")
https://github.com/openGemini/openGemini/blob/v1.2.0/lib/util/lifted/influx/meta/errors.go

引用 ErrUserLocked 代码的地方有:

$ grep -nR "ErrUserLocked"
app/ts-store/run/server_test.go:477:    return nil, meta2.ErrUserLocked
engine/shard_test.go:6616:      return nil, meta2.ErrUserLocked
lib/httpserver/handler.go:99:                                   if err == meta2.ErrUserLocked { 【分析重点】
lib/metaclient/meta_client.go:1969:             return nil, meta2.ErrUserLocked
lib/util/lifted/influx/httpd/handler.go:1993:                                   if err == meta2.ErrUserLocked {
lib/util/lifted/influx/meta/errors.go:186:      // ErrUserLocked is returned when a user that is locked.
lib/util/lifted/influx/meta/errors.go:187:      ErrUserLocked = errors.New("user is locked")
  • lib/httpserver/handler.go#Authenticate

lib/httpserver/handler.go:99: if err == meta2.ErrUserLocked {

func Authenticate(inner func(http.ResponseWriter, *http.Request), client meta.MetaClient, requireAuthentication bool) http.Handler {
    ...
	_, err = client.Authenticate(creds.Username, creds.Password) //client := var client meta.MetaClient
	if err != nil {
		errMsg := "authorization failed"
		if err == meta2.ErrUserLocked {
			errMsg = err.Error()
		}
		err := errno.NewError(errno.HttpUnauthorized)
		log := logger.NewLogger(errno.ModuleHTTP)
		log.Error(errMsg, zap.Error(err))
		util.HttpError(w, err.Error(), http.StatusUnauthorized)
		return
	}
    ...
}
  • /lib/metaclient/meta_client.go#Authenticate(username, password string)

内部调用了 isLockedUser 方法

  • /lib/metaclient/meta_client.go#isLockedUser
func (c *Client) isLockedUser(u string) bool {
	c.muAuthData.RLock()
	defer c.muAuthData.RUnlock()
	if v, ok := c.authFailRcds[u]; ok {
		if len(v.occurTimeLst) >= maxLoginLimit { // 重点关注: maxLoginLimit / occurTimeLst (涉及上锁和解锁的配置和逻辑)
			c.logger.Info("The user has been locked.", zap.String("user", u))
			return true
		}
	}
	return false
}
  • /lib/metaclient/meta_client.go#maxLoginLimit/authFailCacheLimit/lockUserTime/maxLoginValidTime

https://github.com/openGemini/openGemini/blob/v1.2.0/lib/metaclient/meta_client.go

文件路径: /lib/metaclient/meta_client.go

const (
    ...

	//for lock user
	maxLoginLimit      = 5    //Maximum number of login attempts
	authFailCacheLimit = 200  //Size of the channel for processing authentication failures.
	lockUserTime       = 30   //User Lock Duration, in seconds.
	maxLoginValidTime  = 3600 //Validity duration of login records, in seconds.

    ...
)

Y 推荐文献

firewalld / iptables / chkconfig 命令

  • OpenGemini

X 参考文献

  • Influxdb
posted @ 2023-12-13 18:43  千千寰宇  阅读(219)  评论(0)    收藏  举报