记:优化elastic集群踩过的坑

优化elastic集群踩过的坑

原架构

新架构

想象很美好、过程很艰辛、结果很nice

经过上面的架构调整,es集群能够运行的“稳如老狗”,能够轻轻松松承担聚合,数据写入,集群基本处于不败之地,退一步就算无法聚合,但是也不影响简单的日志查询。

当然在改造过程中也遇到了各种问题,简单记录如下:

问题一:变更了集群架构,将原来的master节点直接转换为node节点,新添加的三台配置低的机器作为新的master节点部署集群,在将原架构的master和node节点变更新node节点加入新的集群,这个会有一个问题 原来的node节点中是存在数据的,而数据里面会有一个cluster uuid的标识,而这个正是原来集群的存在的uuid,而将node加入到新集群中会有如下报错:

"Caused by: org.elasticsearch.transport.RemoteTransportException: [hot-8][172.18.0.2:9300][internal:cluster/coordination/join/validate]",
"Caused by: org.elasticsearch.cluster.coordination.CoordinationStateRejectedException: join validation on cluster state with a differentcluster uuid yawFIpzSS-erlTvqNOLI_g than local cluster uuid IsAK0BSURZyZoVTI3eopfw, rejecting",

解决方案:最简单的解决方案是将原来的master节点依旧保持为master节点

那么问题来了,在整体架构调整中master节点的配置是不需要很高的,而原来的master的配置是非常高的,这样就有了资源浪费等问题( 毕竟大家基本都是用的云环境,包年包月的基础将配-众所周知是比较烧钱的),面对这一问题的解决方案是,将原集群的master正常启动,并且保证node节点可以正常加入到原集群,也就是恢复到原来的样子,等集群恢复正常之后(关键点来了) 在讲新的master节点加入到旧的集群当中,全部加入之后就有了6个master节点,这个时候再把旧的master下掉,这样集群的uuid就不会变更,这个时候在将原来的master变更为node节点加入到新的集群当中,这样就完美解决掉出现的第一个问题(其实这也是方案未考虑周全所采的坑)

问题二:重新生成新集群证书设置密码的时候遇到的各种问题

# 生成密码
[root@28ef648fe6a6 elasticsearch]# bin/elasticsearch-setup-passwords interactive

Unexpected response code [500] from calling GET http://10.105.6.223:9200/_security/_authenticate?pretty
It doesn't look like the X-Pack security feature is enabled on this Elasticsearch node.
Please check if you have enabled X-Pack security in your elasticsearch.yml configuration file.

ERROR: X-Pack Security is disabled by configuration.
解决:编辑 elasticsearch.yml将和X-Pack相关的配置打开
[root@79d23e3e6630 elasticsearch]# bin/elasticsearch-setup-passwords interactive
Your cluster health is currently RED.
This means that some cluster data is unavailable and your cluster is not fully functional.

It is recommended that you resolve the issues with your cluster before running elasticsearch-setup-passwords.
It is very likely that the password changes will fail when run against an unhealthy cluster.

Do you want to continue with the password setup process [y/N]y

Initiating the setup of passwords for reserved users elastic,apm_system,kibana,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y


Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana]:
Reenter password for [kibana]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:

Connection failure to: http://10.105.5.201:9200/_security/user/apm_system/_password?pretty failed: Read timed out

ERROR: Failed to set password for user [apm_system].

# 解决
# 集群是red状态,集群未恢复所以设置密码的之后会一直报错,等集群恢复之后在设置密码
[root@79d23e3e6630 elasticsearch]# bin/elasticsearch-setup-passwords interactive

Failed to authenticate user 'elastic' against http://10.105.5.201:9200/_security/_authenticate?pretty
Possible causes include:
 * The password for the 'elastic' user has already been changed on this cluster
 * Your elasticsearch node is running against a different keystore
   This tool used the keystore at /usr/share/elasticsearch/config/elasticsearch.keystore

ERROR: Failed to verify bootstrap password
# 解决
# 密码已经设置过了,不能重复设置,仍要设置密码需要删除elasticsearch.keystore文件并重启集群

结果就是现在很稳

posted @ 2021-11-16 11:31  格桑梅朵儿  阅读(2207)  评论(1编辑  收藏  举报