ElasticSearch源码之——Gateway

版本：es6.3.2

es数据存储

数据存储配置

es集群状态数据查看：

目录文件

集群重启后元数据如何恢复？

gateway模块

1.启动

injector.getInstance(GatewayService.class).start();//节点启动时启动

    @Override
    protected void doStart() {
        // use post applied so that the state will be visible to the background recovery thread we spawn in performStateRecovery
        clusterService.addListener(this);//实现了ClusterStateListener，启动时监听集群状态发生变化
    }

2.配置文件elasticsearch.yml中的gateway相关配置，判断是否满足集群recovery的条件,满足条件后执行恢复操作

gateway.expected_nodes: 10
gateway.expected_master_nodes: 3
gateway.expected_data_nodes: 10
gateway.recover_after_time: 5m
gateway.recover_after_nodes: 9
gateway.recover_after_master_nodes: 2
gateway.recover_after_data_nodes: 9

3.集群状态发生变化后做如下事情：

callClusterStateAppliers(clusterChangedEvent);//1.应用集群状态

callClusterStateListeners(clusterChangedEvent);//2.状态更新后的操作

task.listener.clusterStateProcessed(task.source, previousClusterState, newClusterState);//更新本节点存储的集群状态

    private void callClusterStateListeners(ClusterChangedEvent clusterChangedEvent) {
        Stream.concat(clusterStateListeners.stream(), timeoutClusterStateListeners.stream()).forEach(listener -> {//执行每个监听器的clusterChanged方法
            try {
                logger.trace("calling [{}] with change to version [{}]", listener, clusterChangedEvent.state().version());
                listener.clusterChanged(clusterChangedEvent);
            } catch (Exception ex) {
                logger.warn("failed to notify ClusterStateListener", ex);
            }
        });
    }

ClusterStateListener监听到集群状态发生改变执行

GatewayService.clusterChanged

performStateRecovery(enforceRecoverAfterTime, reason);//恢复操作具体执行过程

DiscoveryNodes nodes = state.nodes();//获取集群中的节点

if (state.nodes().isLocalNodeElectedMaster() == false) {//如果本节点不是master节点，则不进行元数据恢复工作
    // not our job to recover
    return;
}

if (state.nodes().getMasterNodeId() == null) {//元数据恢复，需要满足的条件
    logger.debug("not recovering from gateway, no master elected yet");//要求master选举成功
} else if (recoverAfterNodes != -1 && (nodes.getMasterAndDataNodes().size()) < recoverAfterNodes) {
    logger.debug("not recovering from gateway, nodes_size (data+master) [{}] < recover_after_nodes [{}]",//满足配置文件中给出的条件
        nodes.getMasterAndDataNodes().size(), recoverAfterNodes);
}
....

performStateRecovery(enforceRecoverAfterTime, reason);//集群状态恢复

threadPool.schedule(recoverAfterTime, ThreadPool.Names.GENERIC, () -> {
    if (recovered.compareAndSet(false, true)) {
        logger.info("recover_after_time [{}] elapsed. performing state recovery...", recoverAfterTime);
        gateway.performStateRecovery(recoveryListener);//具体执行恢复的方法，并传入一个listener，等待完成后执行后续过程
    }
});

元数据恢复过程：

首先进行两种元数据的恢复：集群级和索引级
然后进行分片元数据恢复以及分片分配

1)获取集群中的master候选节点

String[] nodesIds = clusterService.state().nodes().getMasterNodes().keys().toArray(String.class);

2)从上述节点中获取所有的metadata信息

TransportNodesListGatewayMetaState.NodesGatewayMetaState nodesState = listGatewayMetaState.list(nodesIds, null).actionGet();

3)根据版本号从各个节点中获取最新的metadata作为最新的metadata，每有一次元数据的更新，版本号会加1

if (electedGlobalState == null) {
    electedGlobalState = nodeState.metaData();
} else if (nodeState.metaData().version() > electedGlobalState.version()) {
    electedGlobalState = nodeState.metaData();
}

4)同时，从各个metadata中的所有索引信息取出来放入一个Map中，供后续metadata中indicies的恢复使用

for (ObjectCursor<IndexMetaData> cursor : nodeState.metaData().indices().values()) {
    indices.addTo(cursor.value.getIndex(), 1);
}

5)构建一份最新的元数据metadata表，并去掉其中的indices信息，在下一阶段选举

// update the global state, and clean the indices, we elect them in the next phase
MetaData.Builder metaDataBuilder = MetaData.builder(electedGlobalState).removeAllIndices();

6)对找到的索引获取最新的版本并进行一些校验后，加入metadata中。如索引是否可用，如果打开不可用，会将该索引的状态设为关闭。

7)构建clusterstate，并执行监听器成功方法，在该方法中提交集群状态，进行路由表routingtable的构建。

将索引信息添加到路由表中，此时分片状态全部为unassigned，哪些分片分配在哪些节点的信息还没有确定。

// initialize all index routing tables as empty
RoutingTable.Builder routingTableBuilder = RoutingTable.builder(updatedState.routingTable());
for (ObjectCursor<IndexMetaData> cursor : updatedState.metaData().indices().values()) {
   routingTableBuilder.addAsRecovery(cursor.value);
}
// start with 0 based versions for routing table
routingTableBuilder.version(0);

// now, reroute
updatedState = ClusterState.builder(updatedState).routingTable(routingTableBuilder.build()).build();
return allocationService.reroute(updatedState, "state recovered");

shard重新分配过程:

分片分配的两个内容：

分片分配给哪个节点
决定哪个分片作为主分片，哪些作为副本

es提供两个组件完成分片的分配：allocators（寻找目前集群情况的最优节点）和deciders（决定是否执行在该节点上的分配）

新建索引的分配策略：allocators找出节点按分片数升序排序(尽可能均衡地分配分片)，deciders遍历allocators给出的节点，并决定是否分配在上面。

已有索引的分配策略：对于主分片，allocators将主分片分配在拥有该分片完整数据的节点上；对于副本，allocators找出拥有该分片副本的节点（即使没有完整数据），如果找到这样的节点，从主分片同步数据，如果没有这样的节点，则从主分片拷贝数据。

我们现在gateway模块进行的是已有索引的分片分配策略

// now, reroute
updatedState = ClusterState.builder(updatedState).routingTable(routingTableBuilder.build()).build();
return allocationService.reroute(updatedState, "state recovered");

1)分片分配决策器

ClusterModule初始化时，创建分片分配决策器 allocationDeciders

this.deciderList = createAllocationDeciders(settings, clusterService.getClusterSettings(), clusterPlugins);
this.allocationDeciders = new AllocationDeciders(settings, deciderList);

2)allocationService进行分片分配

RoutingAllocation allocation = new RoutingAllocation(allocationDeciders, routingNodes, clusterState, clusterInfoService.getClusterInfo(), currentNanoTime());

reroute(allocation);

gatewayAllocator.allocateUnassigned(allocation);

primaryShardAllocator.allocateUnassigned(allocation);//分配主分片
replicaShardAllocator.processExistingRecoveries(allocation);
replicaShardAllocator.allocateUnassigned(allocation);//分片副本

shardsAllocator.allocate(allocation);//分片均衡

分片分配的过程中会遍历所有未分配分配，根据决策决定是否进行分片初始化

数据recovery

shards元信息恢复后，会提交集群状态改变的任务

clusterService.submitStateUpdateTask

当节点收到来自Master的广播后，开始应用新的clusterState

public synchronized void applyClusterState(final ClusterChangedEvent event){

createOrUpdateShards(state);//

        indexShard.startRecovery(recoveryState, recoveryTargetService, recoveryListener, repositoriesService,//开始恢复分片
            (type, mapping) -> {
                assert recoveryState.getRecoverySource().getType() == RecoverySource.Type.LOCAL_SHARDS:
                    "mapping update consumer only required by local shards recovery";
                try {
                    client.admin().indices().preparePutMapping()
                        .setConcreteIndex(shardRouting.index()) // concrete index - no name clash, it uses uuid
                        .setType(type)
                        .setSource(mapping.source().string(), XContentType.JSON)
                        .get();
                } catch (IOException ex) {
                    throw new ElasticsearchException("failed to stringify mapping source", ex);
                }
            }, this);
        return indexShard;

分片恢复后，启动分片，提供分片读写功能

        @Override
        public void onRecoveryDone(RecoveryState state) {
            shardStateAction.shardStarted(shardRouting, "after " + state.getRecoverySource(), SHARD_STATE_ACTION_LISTENER);
        }

致此，ES集群重启后数据恢复完成。

一个集群可能有成千上万个分片，分片恢复的过程是异步执行，gateway模块在集群级、索引级元数据恢复后完成（onSuccess）。

参考文章：https://blog.csdn.net/thomas0yang/article/details/52535604

https://blog.csdn.net/codingapple/article/details/17308433

https://www.easyice.cn/archives/226

https://www.easyice.cn/archives/248 比较全面的gateway+allocation流程分析

posted @ 2019-02-18 20:13 猪朵朵阅读(1150) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

猪朵朵

一点一滴

ElasticSearch源码之——Gateway

公告