Yarn Container启动流程源码分析
在 《ApplicationMaster启动及资源申请源码分析》中,AM 向 RM 注册后,会周期性地通过 RPC 函数 ApplicationMaster#allocate() 与 RM 通信,通信目的包括请求资源、获取新分配的资源及形成周期性心跳,本文中我们重点看看 AM 向 RM 申请到 Container 资源后,如何在 NM 节点上启动 Container,接下来会详细介绍 Container 从申请资源、启动到资源清理整个过程的源码。
一、Container 启动流程介绍
Container 启动是由 ApplicationMaster 通过 RPC 函数 ContainerManagementProtocol#startContainers() 向 NM 发起的,NM 中的 ContainerManagerImpl 组件负责接收并处理该请求。Container 启动过程主要经历三个阶段:资源本地化、启动并运行 Container和资源清理。
- 资源本地化主要是指分布式缓存机制完成的工作,功能包括初始化各种服务组件、创建工作目录、从 HDFS 下载运行所需的各种资源(比如文本文件、JAR 包、可执行文件)等。资源本地化主要有两部分组成,分别是应用程序初始化和 Container 本地化。其中,应用程序初始化的主要工作是初始化各类必需的服务组件(比如日志记录组件 LogHandler、资源状态追踪器 LocalResourceTrackerImpl等),供后续 Container 使用,通常由 Application 的第一个 Container 完成;Container 本地化则是创建工作目录,从 HDFS 下载各类文件资源。
- Container 启动是由 ContainerLauncher 服务完成,该服务将进一步调用插拔式组件 ContainerExecutor。Yarn 中提供了三种 ContainerExecutor 实现,一种是 DefaultContainerExecutor,一种是 LinuxContainerExecutor,另一种是 DockerContainerExecutor,由参数 yarn.nodemanager.container-executor.class 控制具体采用的方式。
- 资源清理则是资源本地化的逆过程,它负责清理各类资源,均由 ResourceLocalizationService 服务完成。
二、Container 启动源码分析
2.1 AM 调用 api 请求启动 Container
在介绍 Container 启动前,我们先来看看 AM 在心跳时如何根据申请到的资源来请求 Container 的启动。AM 通过 RPC 函数 ApplicationMaster#allocate() 周期性向 RM 申请资源,并将申请到的资源保存在阻塞队列 responseQueue 中。
//位置:org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
private class HeartbeatThread extends Thread {
public HeartbeatThread() {
super("AMRM Heartbeater thread");
}
public void run() {
while (true) { // 心跳线程死循环的跑
AllocateResponse response = null;
// synchronization ensures we don't send heartbeats after unregistering
synchronized (unregisterHeartbeatLock) {
if (!keepRunning) {
return;
}
try {
// 重点:心跳线程其实就是周期性的调用 allocate() 方法,将分配出来的 Container 保存在 AllocateResponse 实例中
response = client.allocate(progress);
} catch (ApplicationAttemptNotFoundException e) {
handler.onShutdownRequest();
LOG.info("Shutdown requested. Stopping callback.");
return;
} catch (Throwable ex) {
LOG.error("Exception on heartbeat", ex);
savedException = ex;
// interrupt handler thread in case it waiting on the queue
handlerThread.interrupt();
return;
}
if (response != null) {
while (true) {
try {
// 将 RM 返回的 AllocateResponse 对象资源添加到阻塞队列 responseQueue 中
responseQueue.put(response);
break;
} catch (InterruptedException ex) {
LOG.debug("Interrupted while waiting to put on response queue", ex);
}
}
}
}
try {
Thread.sleep(heartbeatIntervalMs.get());
} catch (InterruptedException ex) {
LOG.debug("Heartbeater interrupted", ex);
}
}
}
}
那 responseQueue 队列保存申请到的 Container 资源怎么使用呢?通过查看 responseQueue.take() 函数,可以发现 AMRMClientAsyncImpl 类中的独立线程 CallbackHandlerThread 会不断地从队列中取出 AllocateResponse 对象进行处理。
//位置:org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
private class CallbackHandlerThread extends Thread {
public CallbackHandlerThread() {
super("AMRM Callback Handler Thread");
}
public void run() {
while (true) { // 死循环取出申请到的 Container 资源并进行处理
if (!keepRunning) {
return;
}
try {
AllocateResponse response;
if(savedException != null) {
LOG.error("Stopping callback due to: ", savedException);
handler.onError(savedException);
return;
}
try {
// 从阻塞队列 responseQueue 取出 Container 资源
response = responseQueue.take();
} catch (InterruptedException ex) {
LOG.info("Interrupted while waiting for queue", ex);
continue;
}
List<NodeReport> updatedNodes = response.getUpdatedNodes();
if (!updatedNodes.isEmpty()) {
handler.onNodesUpdated(updatedNodes);
}
List<ContainerStatus> completed =
response.getCompletedContainersStatuses();
if (!completed.isEmpty()) {
handler.onContainersCompleted(completed);
}
List<Container> allocated = response.getAllocatedContainers();
if (!allocated.isEmpty()) {
// 重点:处理分配出来的 Container
handler.onContainersAllocated(allocated);
}
// 更新 Container 的执行进度
progress = handler.getProgress();
} catch (Throwable ex) {
handler.onError(ex);
// re-throw exception to end the thread
throw new YarnRuntimeException(ex);
}
}
}
}
}
handler.onContainersAllocated(allocated) 方法会对分配出来的 Container 资源进行处理。
//位置:org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
@Override
public void onContainersAllocated(List<Container> allocatedContainers) {
LOG.info("Got response from RM for container ask, allocatedCnt="
+ allocatedContainers.size());
numAllocatedContainers.addAndGet(allocatedContainers.size());
for (Container allocatedContainer : allocatedContainers) {
String yarnShellId = Integer.toString(yarnShellIdCounter);
yarnShellIdCounter++;
LOG.info("Launching shell command on a new container."
+ ", containerId=" + allocatedContainer.getId()
+ ", yarnShellId=" + yarnShellId
+ ", containerNode=" + allocatedContainer.getNodeId().getHost()
+ ":" + allocatedContainer.getNodeId().getPort()
+ ", containerNodeURI=" + allocatedContainer.getNodeHttpAddress()
+ ", containerResourceMemory"
+ allocatedContainer.getResource().getMemory()
+ ", containerResourceVirtualCores"
+ allocatedContainer.getResource().getVirtualCores());
// + ", containerToken"
// +allocatedContainer.getContainerToken().getIdentifier().toString());
// 创建运行 Container 的 LaunchContainerRunnable 线程
Thread launchThread = createLaunchContainerThread(allocatedContainer,
yarnShellId);
// launch and start the container on a separate thread to keep
// the main thread unblocked
// as all containers may not be allocated at one go.
launchThreads.add(launchThread);
launchedContainers.add(allocatedContainer.getId());
// 启动 LaunchContainerRunnable 线程
launchThread.start();
}
}
@VisibleForTesting
Thread createLaunchContainerThread(Container allocatedContainer,
String shellId) {
LaunchContainerRunnable runnableLaunchContainer =
new LaunchContainerRunnable(allocatedContainer, containerListener,
shellId);
return new Thread(runnableLaunchContainer);
}
上面的逻辑启动了一个 LaunchContainerRunnable 线程,LaunchContainerRunnable 是 ApplicationMaster 类的内部类,继承自 Runnable 接口,通过该类的 run() 方法,可以知道该类主要做了两件事:
- 初始化 Contianer 的本地资源,并构建 Container 的启动脚本
- 调用 NMClientAsync#startContainerAsync() api 接口启动 Container。
//位置:org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
public void run() {
LOG.info("Setting up container launch container for containerid="
+ container.getId() + " with shellid=" + shellId);
// 省略构建 Container 启动脚本逻辑
// Set up ContainerLaunchContext, setting local resource, environment,
// command and token for constructor.
// Note for tokens: Set up tokens for the container too. Today, for normal
// shell commands, the container in distribute-shell doesn't need any
// tokens. We are populating them mainly for NodeManagers to be able to
// download anyfiles in the distributed file-system. The tokens are
// otherwise also useful in cases, for e.g., when one is running a
// "hadoop dfs" command inside the distributed shell.
Map<String, String> myShellEnv = new HashMap<String, String>(shellEnv);
myShellEnv.put(YARN_SHELL_ID, shellId);
ContainerLaunchContext ctx = ContainerLaunchContext.newInstance(
localResources, myShellEnv, commands, null, allTokens.duplicate(),
null);
containerListener.addContainer(container.getId(), container);
// 2. 重点:通过 NMClientAsync api 启动分配出来的 Container
nmClientAsync.startContainerAsync(container, ctx);
}
}
可以看到 nmClientAsync.startContainerAsync() 方法并没有真正启动 Container,而是将 ContainerEventType.START_CONTAINER 事件封装成 ContainerEvent 对象(StartContainerEvent 类继承自 ContainerEvent 类),并添加到 Container 事件处理的阻塞队列 events 中,具体操作处理流程由 events 队列的消费逻辑处理。
//位置:org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
public void startContainerAsync(
Container container, ContainerLaunchContext containerLaunchContext) {
if (containers.putIfAbsent(container.getId(),
new StatefulContainer(this, container.getId())) != null) {
callbackHandler.onStartContainerError(container.getId(),
RPCUtil.getRemoteException("Container " + container.getId() +
" is already started or scheduled to start"));
}
try {
events.put(new StartContainerEvent(container, containerLaunchContext));
} catch (InterruptedException e) {
LOG.warn("Exception when scheduling the event of starting Container " +
container.getId());
callbackHandler.onStartContainerError(container.getId(), e);
}
}
那这里的阻塞队列 events 又是怎么处理呢?还是来找找 events.take() 方法,发现在 NMClientAsyncImpl 类执行 serviceStart() 方法时会启动一个线程去消费 events 队列的事件,队列取出来的事件对象为内部封装有 ContainerEventType.START_CONTAINER 事件的 ContainerEvent 对象,通过 getContainerEventProcessor(event) 方法,获取对应的 ContainerEvent 对象的处理器 ContainerEventProcessor,并以线程池的方式运行该处理器。
//位置:org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
protected void serviceStart() throws Exception {
client.start();
ThreadFactory tf = new ThreadFactoryBuilder().setNameFormat(
this.getClass().getName() + " #%d").setDaemon(true).build();
// Start with a default core-pool size and change it dynamically.
int initSize = Math.min(INITIAL_THREAD_POOL_SIZE, maxThreadPoolSize);
threadPool = new ThreadPoolExecutor(initSize, Integer.MAX_VALUE, 1,
TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>(), tf);
eventDispatcherThread = new Thread() {
@Override
public void run() {
ContainerEvent event = null;
Set<String> allNodes = new HashSet<String>();
while (!stopped.get() && !Thread.currentThread().isInterrupted()) {
try {
// 从阻塞队列 events 中取出 ContainerEvent 事件
event = events.take();
} catch (InterruptedException e) {
if (!stopped.get()) {
LOG.error("Returning, thread interrupted", e);
}
return;
}
allNodes.add(event.getNodeId().toString());
int threadPoolSize = threadPool.getCorePoolSize();
// We can increase the pool size only if haven't reached the maximum
// limit yet.
if (threadPoolSize != maxThreadPoolSize) {
// nodes where containers will run at *this* point of time. This is
// *not* the cluster size and doesn't need to be.
int nodeNum = allNodes.size();
int idealThreadPoolSize = Math.min(maxThreadPoolSize, nodeNum);
if (threadPoolSize < idealThreadPoolSize) {
// Bump up the pool size to idealThreadPoolSize +
// INITIAL_POOL_SIZE, the later is just a buffer so we are not
// always increasing the pool-size
int newThreadPoolSize = Math.min(maxThreadPoolSize,
idealThreadPoolSize + INITIAL_THREAD_POOL_SIZE);
LOG.info("Set NMClientAsync thread pool size to " +
newThreadPoolSize + " as the number of nodes to talk to is "
+ nodeNum);
threadPool.setCorePoolSize(newThreadPoolSize);
}
}
// 重点:根据获取到的 Container 事件类型为 ContainerEventType.START_CONTAINER
// getContainerEventProcessor(event) 返回一个 ContainerEventProcessor 线程对象,并在线程池中启动
threadPool.execute(getContainerEventProcessor(event));
}
}
};
// 启动线程
eventDispatcherThread.setName("Container Event Dispatcher");
eventDispatcherThread.setDaemon(false);
eventDispatcherThread.start();
super.serviceStart();
}
ContainerEventProcessor 处理器类是 NMClientAsyncImpl 类的内部类,继承自 Runnable 类,那我们来看看该类的 run() 方法,根据事件类型 ContainerEventType.START_CONTAINER 进入到对应的执行逻辑中,并通过 handle() 方法交给对应的状态机执行。
//位置:org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
public void run() {
ContainerId containerId = event.getContainerId();
LOG.info("Processing Event " + event + " for Container " + containerId);
// 对 ContainerEventType.QUERY_CONTAINER 事件单独处理
if (event.getType() == ContainerEventType.QUERY_CONTAINER) {
try {
ContainerStatus containerStatus = client.getContainerStatus(
containerId, event.getNodeId());
try {
callbackHandler.onContainerStatusReceived(
containerId, containerStatus);
} catch (Throwable thr) {
// Don't process user created unchecked exception
LOG.info(
"Unchecked exception is thrown from onContainerStatusReceived" +
" for Container " + event.getContainerId(), thr);
}
} catch (YarnException e) {
onExceptionRaised(containerId, e);
} catch (IOException e) {
onExceptionRaised(containerId, e);
} catch (Throwable t) {
onExceptionRaised(containerId, t);
}
} else {
// ContainerEventType.START_CONTAINER 和 ContainerEventType.STOP_CONTAINER 事件处理逻辑
StatefulContainer container = containers.get(containerId);
if (container == null) {
LOG.info("Container " + containerId + " is already stopped or failed");
} else {
// 根据事件类型交给对应的状态机处理
container.handle(event);
if (isCompletelyDone(container)) {
containers.remove(containerId);
}
}
}
}
ContainerEventType.START_CONTAINER 事件的注册状态机为 StartContainerTransition。
//位置:org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
// Transitions from PREP state
.addTransition(ContainerState.PREP,
EnumSet.of(ContainerState.RUNNING, ContainerState.FAILED),
ContainerEventType.START_CONTAINER,
new StartContainerTransition())
StartContainerTransition 状态机里的转换方法 transition()。
//位置:org/apache/hadoop/yarn/client/api/async/impl/NMClientAsyncImpl.java
@Override
public ContainerState transition(
StatefulContainer container, ContainerEvent event) {
ContainerId containerId = event.getContainerId();
try {
StartContainerEvent scEvent = null;
if (event instanceof StartContainerEvent) {
scEvent = (StartContainerEvent) event;
}
assert scEvent != null;
//重点:调用 NMClient 类的 startContainer() 启动 Container
Map<String, ByteBuffer> allServiceResponse =
container.nmClientAsync.getClient().startContainer(
scEvent.getContainer(), scEvent.getContainerLaunchContext());
try {
// 通过回调的方式更新 Container 状态
container.nmClientAsync.getCallbackHandler().onContainerStarted(
containerId, allServiceResponse);
} catch (Throwable thr) {
// Don't process user created unchecked exception
LOG.info("Unchecked exception is thrown from onContainerStarted for "
+ "Container " + containerId, thr);
}
// 返回 Container 的 RUNNING 状态
return ContainerState.RUNNING;
} catch (YarnException e) {
return onExceptionRaised(container, event, e);
} catch (IOException e) {
return onExceptionRaised(container, event, e);
} catch (Throwable t) {
return onExceptionRaised(container, event, t);
}
}
在这里看到了激动人心的 startContainer() 方法,不过别急,这里还没有到真正的启动 Container 的时候,这里首先获取到 AM 真正与 NM 交互的客户端 NMClient,并调用其实现类 NMClientImpl 的 startContainer() 方法,获取到与 NM 交互的 RPC 协议 ContainerManagementProtocol,并通过其协议的 startContainers() 方法实现 RPC 远程调用,来实现 Container 的启动。
// 位置:org/apache/hadoop/yarn/client/api/impl/NMClientImpl.java
public Map<String, ByteBuffer> startContainer(
Container container, ContainerLaunchContext containerLaunchContext)
throws YarnException, IOException {
// 构建 StartContainer 对象
StartedContainer startingContainer =
new StartedContainer(container.getId(), container.getNodeId());
synchronized (startingContainer) {
addStartingContainer(startingContainer);
Map<String, ByteBuffer> allServiceResponse;
ContainerManagementProtocolProxyData proxy = null;
try {
proxy =
cmProxy.getProxy(container.getNodeId().toString(),
container.getId());
StartContainerRequest scRequest =
StartContainerRequest.newInstance(containerLaunchContext,
container.getContainerToken());
List<StartContainerRequest> list = new ArrayList<StartContainerRequest>();
list.add(scRequest);
StartContainersRequest allRequests =
StartContainersRequest.newInstance(list);
// 重点:获取到 RPC 调用协议 ContainerManagementProtocol,并通过 RPC 函数 startContainers 启动 Container
StartContainersResponse response =
proxy
.getContainerManagementProtocol().startContainers(allRequests);
if (response.getFailedRequests() != null
&& response.getFailedRequests().containsKey(container.getId())) {
Throwable t =
response.getFailedRequests().get(container.getId()).deSerialize();
parseAndThrowException(t);
}
allServiceResponse = response.getAllServicesMetaData();
startingContainer.state = ContainerState.RUNNING;
} catch (YarnException e) {
// 省略异常的状态返回
} finally {
if (proxy != null) {
cmProxy.mayBeCloseProxy(proxy);
}
}
return allServiceResponse;
}
}
NMClient 调用 RPC 函数 ContainerManagementProtocol#startContainers() 启动 Container。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
@Override
public StartContainersResponse
startContainers(StartContainersRequest requests) throws YarnException,
IOException {
if (blockNewContainerRequests.get()) {
throw new NMNotYetReadyException(
"Rejecting new containers as NodeManager has not"
+ " yet connected with ResourceManager");
}
UserGroupInformation remoteUgi = getRemoteUgi();
NMTokenIdentifier nmTokenIdentifier = selectNMTokenIdentifier(remoteUgi);
authorizeUser(remoteUgi,nmTokenIdentifier);
List<ContainerId> succeededContainers = new ArrayList<ContainerId>();
Map<ContainerId, SerializedException> failedContainers =
new HashMap<ContainerId, SerializedException>();
for (StartContainerRequest request : requests.getStartContainerRequests()) {
ContainerId containerId = null;
try {
ContainerTokenIdentifier containerTokenIdentifier =
BuilderUtils.newContainerTokenIdentifier(request.getContainerToken());
verifyAndGetContainerTokenIdentifier(request.getContainerToken(),
containerTokenIdentifier);
containerId = containerTokenIdentifier.getContainerID();
// 启动 Contain 的内部逻辑
startContainerInternal(nmTokenIdentifier, containerTokenIdentifier,
request);
succeededContainers.add(containerId);
} catch (YarnException e) {
failedContainers.put(containerId, SerializedException.newInstance(e));
} catch (InvalidToken ie) {
failedContainers.put(containerId, SerializedException.newInstance(ie));
throw ie;
} catch (IOException e) {
throw RPCUtil.getRemoteException(e);
}
}
return StartContainersResponse.newInstance(getAuxServiceMetaData(),
succeededContainers, failedContainers);
}
至此,AM 与 NM 的交互流程已实现,通过 RPC 函数 ContainerManagementProtocol#startContainers() 来启动 Container,那 Container 又是如何在 NM 上启动的呢?这一块我们留在后面介绍。
2.2 Container 资源本地化
上面过程中 AM 通过调用 RPC 函数 ContainerManagementProtocol#startContainers() 开始启动 Container,这部分我们来看看具体的启动逻辑,即 startContainerInternal() 方法。这里做了两件事
- 发送 ApplicationEventType.INIT_APPLICATION 事件,对应用程序资源的初始化,主要是初始化各类必需的服务组件(如日志记录组件 LogHandler、资源状态追踪组件 LocalResourcesTrackerImpl等),供后续 Container 启动,通常来自 ApplicationMaster 的第一个 Container 完成,后续的 Container 跳过这段 Application 初始化过程。
- 发送 ApplicationEventType.INIT_CONTAINER 事件,对 Container 进行初始化操作。(这部分事件留在 Container 启动环节介绍)
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
private void startContainerInternal(NMTokenIdentifier nmTokenIdentifier,
ContainerTokenIdentifier containerTokenIdentifier,
StartContainerRequest request) throws YarnException, IOException {
// 省略Token认证及ContainerLaunchContext上下文初始化
this.readLock.lock();
try {
if (!serviceStopped) {
// Create the application
Application application =
new ApplicationImpl(dispatcher, user, applicationID, credentials, context);
// 应用程序的初始化,供后续Container使用,这个逻辑只调用一次,通常由来自ApplicationMaster的第一个Container完成
if (null == context.getApplications().putIfAbsent(applicationID,
application)) {
LOG.info("Creating a new application reference for app " + applicationID);
LogAggregationContext logAggregationContext =
containerTokenIdentifier.getLogAggregationContext();
Map<ApplicationAccessType, String> appAcls =
container.getLaunchContext().getApplicationACLs();
context.getNMStateStore().storeApplication(applicationID,
buildAppProto(applicationID, user, credentials, appAcls,
logAggregationContext));
// 1.向 ApplicationImpl 发送 ApplicationEventType.INIT_APPLICATION 事件
dispatcher.getEventHandler().handle(
new ApplicationInitEvent(applicationID, appAcls,
logAggregationContext));
}
// 2.向 ApplicationImpl 发送 ApplicationEventType.INIT_CONTAINER 事件
this.context.getNMStateStore().storeContainer(containerId, request);
dispatcher.getEventHandler().handle(
new ApplicationContainerInitEvent(container));
this.context.getContainerTokenSecretManager().startContainerSuccessful(
containerTokenIdentifier);
NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
"ContainerManageImpl", applicationID, containerId);
// TODO launchedContainer misplaced -> doesn't necessarily mean a container
// launch. A finished Application will not launch containers.
metrics.launchedContainer();
metrics.allocateContainer(containerTokenIdentifier.getResource());
} else {
throw new YarnException(
"Container start failed as the NodeManager is " +
"in the process of shutting down");
}
} finally {
this.readLock.unlock();
}
}
ApplicationEventType.INIT_APPLICATION 事件的状态转换过程,状态由 NEW 转变为 INITING,对应的状态机为 AppInitTransition。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
// Transitions from NEW state
.addTransition(ApplicationState.NEW, ApplicationState.INITING,
ApplicationEventType.INIT_APPLICATION, new AppInitTransition())
AppInitTransition 状态机设置 ACL 属性后,并向 LogHandler(目前有两种实现方式,分别是 LogAggregationService 和 NonAggregatingLogHandler 发送一个 LogHandlerEventType.APPLICATION_STARTED 事件。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
static class AppInitTransition implements
SingleArcTransition<ApplicationImpl, ApplicationEvent> {
@Override
public void transition(ApplicationImpl app, ApplicationEvent event) {
ApplicationInitEvent initEvent = (ApplicationInitEvent)event;
// 设置 ACL 属性
app.applicationACLs = initEvent.getApplicationACLs();
app.aclsManager.addApplication(app.getAppId(), app.applicationACLs);
// Inform the logAggregator
app.logAggregationContext = initEvent.getLogAggregationContext();
// 向 LogHandler 发送 LogHandlerEventType.APPLICATION_STARTED 事件
app.dispatcher.getEventHandler().handle(
new LogHandlerAppStartedEvent(app.appId, app.user,
app.credentials, ContainerLogsRetentionPolicy.ALL_CONTAINERS,
app.applicationACLs, app.logAggregationContext));
}
}
这里以 LogAggregationService 服务为例,当 LogHandler 收到 ApplicationEventType.APPLICATION_LOG_HANDLING_INITED 事件后,将创建应用程序日志目录、设置目录权限等。然后向 ApplicationImpl 发送一个 ApplicationEventType.APPLICATION_LOG_HANDLING_INITED 事件。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/logaggregation/LogAggregationService.java
@Override
public void handle(LogHandlerEvent event) {
switch (event.getType()) {
case APPLICATION_STARTED:
LogHandlerAppStartedEvent appStartEvent =
(LogHandlerAppStartedEvent) event;
// 事情处理逻辑
initApp(appStartEvent.getApplicationId(), appStartEvent.getUser(),
appStartEvent.getCredentials(),
appStartEvent.getLogRetentionPolicy(),
appStartEvent.getApplicationAcls(),
appStartEvent.getLogAggregationContext());
break;
case CONTAINER_FINISHED: // 省略
case APPLICATION_FINISHED: // 省略
default:
; // Ignore
}
}
private void initApp(final ApplicationId appId, String user,
Credentials credentials, ContainerLogsRetentionPolicy logRetentionPolicy,
Map<ApplicationAccessType, String> appAcls,
LogAggregationContext logAggregationContext) {
ApplicationEvent eventResponse;
try {
// 创建应用程序日志目录、设置目录权限等
verifyAndCreateRemoteLogDir(getConfig());
initAppAggregator(appId, user, credentials, logRetentionPolicy, appAcls,
logAggregationContext);
eventResponse = new ApplicationEvent(appId,
ApplicationEventType.APPLICATION_LOG_HANDLING_INITED);
} catch (YarnRuntimeException e) {
LOG.warn("Application failed to init aggregation", e);
eventResponse = new ApplicationEvent(appId,
ApplicationEventType.APPLICATION_LOG_HANDLING_FAILED);
}
// 向 ApplicationImpl 发送 ApplicationEventType.APPLICATION_LOG_HANDLING_INITED 事件
this.dispatcher.getEventHandler().handle(eventResponse);
}
ApplicationImpl 收到 ApplicationEventType.APPLICATION_LOG_HANDLING_INITED 事件后,直接向 ResourceLocalizationService 发送 LocalizationEventType.INIT_APPLICATION_RESOURCES 事件,此时 ApplicationImpl 仍处于 INITING 状态。ResourceLocalizationService 收到事件请求时进入到 handle() 逻辑处理,这里会创建一个 LocalResourcesTrackerImpl 对象,为接下来资源下载做准备,并向 ApplicationImpl 发送一个 ApplicationEventType.APPLICATION_INITED 事件。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
public void handle(LocalizationEvent event) {
// TODO: create log dir as $logdir/$user/$appId
switch (event.getType()) {
case INIT_APPLICATION_RESOURCES: // 处理 LocalizationEventType.INIT_APPLICATION_RESOURCES 事件
handleInitApplicationResources(
((ApplicationLocalizationEvent)event).getApplication());
break;
case INIT_CONTAINER_RESOURCES: // 省略
case CONTAINER_RESOURCES_LOCALIZED: // 省略
case CACHE_CLEANUP: // 省略
case CLEANUP_CONTAINER_RESOURCES: // 省略
case DESTROY_APPLICATION_RESOURCES: // 省略
default:
throw new YarnRuntimeException("Unknown localization event: " + event);
}
}
private void handleInitApplicationResources(Application app) {
String userName = app.getUser();
// 创建 LocalResourcesTrackerImpl 对象,为接下来的资源下载做准备
privateRsrc.putIfAbsent(userName, new LocalResourcesTrackerImpl(userName,
null, dispatcher, true, super.getConfig(), stateStore));
String appIdStr = ConverterUtils.toString(app.getAppId());
appRsrc.putIfAbsent(appIdStr, new LocalResourcesTrackerImpl(app.getUser(),
app.getAppId(), dispatcher, false, super.getConfig(), stateStore));
// 向 ApplicationImpl 发送 ApplicationEventType.APPLICATION_INITED 事件
dispatcher.getEventHandler().handle(new ApplicationInitedEvent(
app.getAppId()));
}
ApplicationImpl 收到 ApplicationEventType.APPLICATION_INITED 事件后,依次向该应用程序已经保持的所有 Container 发送一个 INIT_CONTAINER 事件以通知它们进行初始化。此时,ApplicationImpl 运行状态由 INITING 转换为 RUNNING。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
static class AppInitDoneTransition implements
SingleArcTransition<ApplicationImpl, ApplicationEvent> {
@Override
public void transition(ApplicationImpl app, ApplicationEvent event) {
// Start all the containers waiting for ApplicationInit
for (Container container : app.containers.values()) {
// 向应用程序保存的 Container 发送 INIT_CONTAINER 事件
app.dispatcher.getEventHandler().handle(new ContainerInitEvent(
container.getContainerId()));
}
}
}
ContainerImpl 收到 INIT_CONTAINER 事件后,先向附属服务 AuxServices 发送 APPLICATION_INIT 事件,以通知它有新的应用程序 Container 启动,然后从 ContainerLaunchContext 中获取各类可见性资源,并保存到 ContainerImpl 中特定的数据结构中,之后向 ResourceLocalizationService 发送 LocalizationEventType.INIT_CONTAINER_RESOURCES 事件,此时 ContainerImpl 运行状态已由 NEW 转换为 LOCALIZING。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
static class RequestResourcesTransition implements
MultipleArcTransition<ContainerImpl,ContainerEvent,ContainerState> {
@Override
public ContainerState transition(ContainerImpl container,
ContainerEvent event) {
// 向 AuxService 发送 AuxServicesEventType.CONTAINER_INIT 事件
container.dispatcher.getEventHandler().handle(new AuxServicesEvent
(AuxServicesEventType.CONTAINER_INIT, container));
// Inform the AuxServices about the opaque serviceData
Map<String,ByteBuffer> csd = ctxt.getServiceData();
if (csd != null) {
// This can happen more than once per Application as each container may
// have distinct service data
for (Map.Entry<String,ByteBuffer> service : csd.entrySet()) {
container.dispatcher.getEventHandler().handle(
new AuxServicesEvent(AuxServicesEventType.APPLICATION_INIT,
container.user, container.containerId
.getApplicationAttemptId().getApplicationId(),
service.getKey().toString(), service.getValue()));
}
}
container.containerLocalizationStartTime = clock.getTime();
// 从 ContainerLaunchContext 获取各类资源,并保持在数据结构中
Map<String,LocalResource> cntrRsrc = ctxt.getLocalResources();
if (!cntrRsrc.isEmpty()) {
try {
for (Map.Entry<String,LocalResource> rsrc : cntrRsrc.entrySet()) {
try {
LocalResourceRequest req =
new LocalResourceRequest(rsrc.getValue());
List<String> links = container.pendingResources.get(req);
if (links == null) {
links = new ArrayList<String>();
container.pendingResources.put(req, links);
}
links.add(rsrc.getKey());
switch (rsrc.getValue().getVisibility()) {
case PUBLIC:
container.publicRsrcs.add(req);
break;
case PRIVATE:
container.privateRsrcs.add(req);
break;
case APPLICATION:
container.appRsrcs.add(req);
break;
}
} catch (URISyntaxException e) {
LOG.info("Got exception parsing " + rsrc.getKey()
+ " and value " + rsrc.getValue());
throw e;
}
}
}
// 向 ResourceLocalizationService 发送 LocalizationEventType.INIT_CONTAINER_RESOURCES 事件
container.dispatcher.getEventHandler().handle(
new ContainerLocalizationRequestEvent(container, req));
return ContainerState.LOCALIZING;
} else {
// 这种情况是 Contaienr 已经进行了资源初始化操作,这里直接运行 Container
container.sendLaunchEvent();
container.metrics.endInitingContainer();
return ContainerState.LOCALIZED;
}
}
}
ResourceLocalizationService 收到 LocalizationEventType.INIT_CONTAINER_RESOURCES 事件后,依次将 Container 所需的资源封装成一个 REQUEST 事件,发送给对应的资源状态追踪器 LocalResourcesTrackerImpl。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
public void handle(LocalizationEvent event) {
// TODO: create log dir as $logdir/$user/$appId
switch (event.getType()) {
case INIT_APPLICATION_RESOURCES: // 省略
case INIT_CONTAINER_RESOURCES:
// 将 Container 所需的资源单独封装成一个 REQUEST 事件,发送给对应的资源状态跟踪器 LocalResourcesTrackerImpl
handleInitContainerResources((ContainerLocalizationRequestEvent) event);
break;
case CONTAINER_RESOURCES_LOCALIZED: // 省略
case CACHE_CLEANUP: // 省略
case CLEANUP_CONTAINER_RESOURCES: // 省略
case DESTROY_APPLICATION_RESOURCES: // 省略
default:
throw new YarnRuntimeException("Unknown localization event: " + event);
}
}
LocalResourcesTrackerImpl 收到 REQUEST 事件后,将为对应的资源创建一个状态机对象 LocalizeResource 以跟踪资源的生命周期,并将 REQUEST 事件进一步传送给 LocalizedResource。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
// From INIT (ref == 0, awaiting req)
.addTransition(ResourceState.INIT, ResourceState.DOWNLOADING,
ResourceEventType.REQUEST, new FetchResourceTransition())
LocalizedResource 收到 REQUEST 事件后,将待下载资源信息通过 LocalizerEventType.REQUEST_RESOURCE_LOCALIZATION 事件发送给资源下载服务 ResourceLocalizationService,之后 LocalizedResource 状态由 NEW 转换为 DOWNLOADING。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
private static class FetchResourceTransition extends ResourceTransition {
@Override
public void transition(LocalizedResource rsrc, ResourceEvent event) {
ResourceRequestEvent req = (ResourceRequestEvent) event;
LocalizerContext ctxt = req.getContext();
ContainerId container = ctxt.getContainerId();
rsrc.ref.add(container);
rsrc.dispatcher.getEventHandler().handle(
new LocalizerResourceRequestEvent(rsrc, req.getVisibility(), ctxt,
req.getLocalResourceRequest().getPattern()));
}
}
ResourceLocalizationService 收到 LocalizerEventType.REQUEST_RESOURCE_LOCALIZATION 事件后,将交给 LocalizerTracker 服务处理,如果是 PUBLIC 资源,则统一交给 PublicLocalizer 处理,否则检查是否已经为该 Container 创建了 LocalizerRunner 线程,如果没有,则创建一个,否则直接添加到该线程的下载队列中。该线程会调用 ContainerExecutor#startLocalizer() 函数下载资源,该函数通过协议 LocalizationProtocol 与 ResourceLocalizationService 通信,以顺序获取待下载资源位置下载。待资源下载完成后,PublicLocalize 或者 LocalizerRunner 都会向 LocalizedResource 发送一个 LOCALIZED 事件。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
public void handle(LocalizerEvent event) {
String locId = event.getLocalizerId();
switch (event.getType()) {
case REQUEST_RESOURCE_LOCALIZATION:
// 0) find running localizer or start new thread
LocalizerResourceRequestEvent req =
(LocalizerResourceRequestEvent)event;
//根据 REQUEST 资源判断资源的可见性
switch (req.getVisibility()) {
// 如果是 PUBLIC 资源,则交给线程 PublicLocalizer 处理
case PUBLIC:
publicLocalizer.addResource(req);
break;
case PRIVATE:
case APPLICATION:
synchronized (privLocalizers) {
LocalizerRunner localizer = privLocalizers.get(locId);
// 检查是否创建了 LocalizerRunner 线程
if (null == localizer) {
LOG.info("Created localizer for " + locId);
localizer = new LocalizerRunner(req.getContext(), locId);
privLocalizers.put(locId, localizer);
localizer.start();
}
// 1) propagate event
localizer.addResource(req);
}
break;
}
break;
}
}
LocalizedResource 收到 LOCALIZED 事件后,会向 ContainerImpl 发送一个 ContainerEventType.RESOURCE_LOCALIZED 事件,并且将状态从 DOWNLOADING 转换为 LOCALIZED。ContainerImpl 收到事件后,会检查所依赖的资源是否全部下载完毕,如果下载完成则向 ContainersLauncher 服务发送一个 LAUNCH_CONTAINER 事件,以启动对应 Container。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/LocalizedResource.java
private static class FetchSuccessTransition extends ResourceTransition {
@Override
public void transition(LocalizedResource rsrc, ResourceEvent event) {
ResourceLocalizedEvent locEvent = (ResourceLocalizedEvent) event;
rsrc.localPath =
Path.getPathWithoutSchemeAndAuthority(locEvent.getLocation());
rsrc.size = locEvent.getSize();
for (ContainerId container : rsrc.ref) {
// 向 ContainerImpl 发送 ContainerEventType.RESOURCE_LOCALIZED 事件
rsrc.dispatcher.getEventHandler().handle(
new ContainerResourceLocalizedEvent(
container, rsrc.rsrc, rsrc.localPath));
}
}
}
至此,Container 资源本地化资源已下载完毕,接下来就开始启动和运行 Container。
2.3 启动和运行 Container
Container 运行是由 ContainersLauncher 服务实现的,主要过程可概括为:将待运行的 Container 所需的环境和运行命令写到 Shell 脚本 launch_container.sh 脚本中,并将启动该脚本的命令写入 default_container_executro.sh 中,然后通过该脚本启动 Container。之所以要将 Container 运行命令写到脚本中并通过运行脚本来执行它,主要是直接执行命令可能让一些特殊符号发生转义。
上面主要介绍 startContainerInternal() 的第一个事件处理,接下来看第一个事件的处理,以及如何启动和运行 Container。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
private void startContainerInternal(NMTokenIdentifier nmTokenIdentifier,
ContainerTokenIdentifier containerTokenIdentifier,
StartContainerRequest request) throws YarnException, IOException {
// 省略Token认证及ContainerLaunchContext上下文初始化
this.readLock.lock();
try {
if (!serviceStopped) {
// Create the application
Application application =
new ApplicationImpl(dispatcher, user, applicationID, credentials, context);
// 应用程序的初始化,供后续Container使用,这个逻辑只调用一次,通常由来自ApplicationMaster的第一个Container完成
if (null == context.getApplications().putIfAbsent(applicationID,
application)) {
LOG.info("Creating a new application reference for app " + applicationID);
LogAggregationContext logAggregationContext =
containerTokenIdentifier.getLogAggregationContext();
Map<ApplicationAccessType, String> appAcls =
container.getLaunchContext().getApplicationACLs();
context.getNMStateStore().storeApplication(applicationID,
buildAppProto(applicationID, user, credentials, appAcls,
logAggregationContext));
// 1.向 ApplicationImpl 发送 ApplicationEventType.INIT_APPLICATION 事件
dispatcher.getEventHandler().handle(
new ApplicationInitEvent(applicationID, appAcls,
logAggregationContext));
}
// 2.向 ApplicationImpl 发送 ApplicationEventType.INIT_CONTAINER 事件
this.context.getNMStateStore().storeContainer(containerId, request);
dispatcher.getEventHandler().handle(
new ApplicationContainerInitEvent(container));
this.context.getContainerTokenSecretManager().startContainerSuccessful(
containerTokenIdentifier);
NMAuditLogger.logSuccess(user, AuditConstants.START_CONTAINER,
"ContainerManageImpl", applicationID, containerId);
// TODO launchedContainer misplaced -> doesn't necessarily mean a container
// launch. A finished Application will not launch containers.
metrics.launchedContainer();
metrics.allocateContainer(containerTokenIdentifier.getResource());
} else {
throw new YarnException(
"Container start failed as the NodeManager is " +
"in the process of shutting down");
}
} finally {
this.readLock.unlock();
}
}
这里触发了 Application 的事件 ApplicationEventType.INIT_CONTAINER,下面是该事件的状态转换过程及对应注册的状态机。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
// Transitions from NEW state
.addTransition(ApplicationState.NEW, ApplicationState.NEW,
ApplicationEventType.INIT_CONTAINER,
new InitContainerTransition())
InitContainerTransition 状态机的处理逻辑。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/application/ApplicationImpl.java
static class InitContainerTransition implements
SingleArcTransition<ApplicationImpl, ApplicationEvent> {
@Override
public void transition(ApplicationImpl app, ApplicationEvent event) {
ApplicationContainerInitEvent initEvent =
(ApplicationContainerInitEvent) event;
Container container = initEvent.getContainer();
app.containers.put(container.getContainerId(), container);
LOG.info("Adding " + container.getContainerId()
+ " to application " + app.toString());
switch (app.getApplicationState()) {
case RUNNING:
// 应用程序提交后app是RUNNING状态,这里向调度器发送 ContainerEventType.INIT_CONTAINER 事件
app.dispatcher.getEventHandler().handle(new ContainerInitEvent(
container.getContainerId()));
break;
case INITING:
case NEW:
// these get queued up and sent out in AppInitDoneTransition
break;
default:
assert false : "Invalid state for InitContainerTransition: " +
app.getApplicationState();
}
}
}
ContainerEventType.INIT_CONTAINER 事件对应的状态转换及注册的状态机。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
// From NEW State
.addTransition(ContainerState.NEW,
EnumSet.of(ContainerState.LOCALIZING,
ContainerState.LOCALIZED,
ContainerState.LOCALIZATION_FAILED,
ContainerState.DONE),
ContainerEventType.INIT_CONTAINER, new RequestResourcesTransition())
RequestResourcesTransition 状态机行为的关键在于 sendLaunchEvent() 方法的调用,发送 Container 启动的事情请求,向调度器发送 ContainersLauncherEventType.LAUNCH_CONTAINER 事件。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
static class RequestResourcesTransition implements
MultipleArcTransition<ContainerImpl,ContainerEvent,ContainerState> {
@Override
public ContainerState transition(ContainerImpl container,
ContainerEvent event) {
// 省略一些检查逻辑
// 重点:发送启动Container的操作
container.sendLaunchEvent();
container.metrics.endInitingContainer();
return ContainerState.LOCALIZED;
}
}
private void sendLaunchEvent() {
ContainersLauncherEventType launcherEvent =
ContainersLauncherEventType.LAUNCH_CONTAINER;
if (recoveredStatus == RecoveredContainerStatus.LAUNCHED) {
// try to recover a container that was previously launched
launcherEvent = ContainersLauncherEventType.RECOVER_CONTAINER;
}
containerLaunchStartTime = clock.getTime();
// 向调度器发送 ContainersLauncherEventType.LAUNCH_CONTAINER 事件请求
dispatcher.getEventHandler().handle(
new ContainersLauncherEvent(this, launcherEvent));
}
这里向调度器发送 ContainersLauncherEventType.LAUNCH_CONTAINER 事件请求,之前发送事件状态转换过程不太一样,在代码中我们找到该事件的状态转换过程及注册状态机,那是由谁来处理这个事件请求呢?我们就需要看看 ContainersLauncherEventType 事件类注册的地方。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/ContainerManagerImpl.java
public ContainerManagerImpl(Context context, ContainerExecutor exec,
DeletionService deletionContext, NodeStatusUpdater nodeStatusUpdater,
NodeManagerMetrics metrics, ApplicationACLsManager aclsManager,
LocalDirsHandlerService dirsHandler) {
dispatcher.register(ContainerEventType.class,
new ContainerEventDispatcher());
dispatcher.register(ApplicationEventType.class,
new ApplicationEventDispatcher());
dispatcher.register(LocalizationEventType.class, rsrcLocalizationSrvc);
dispatcher.register(AuxServicesEventType.class, auxiliaryServices);
dispatcher.register(ContainersMonitorEventType.class, containersMonitor);
// ContainersLauncherEventType 事件类的注册方法
dispatcher.register(ContainersLauncherEventType.class, containersLauncher);
addService(dispatcher);
ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
this.readLock = lock.readLock();
this.writeLock = lock.writeLock();
}
可以看出 ContainersLauncherEventType 事件类型类注册的事件处理器为 ContainersLauncher 类,那该类又是如何处理 ContainersLauncherEventType.LAUNCH_CONTAINER 事件请求呢?
//位置:rg/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
public void handle(ContainersLauncherEvent event) {
// TODO: ContainersLauncher launches containers one by one!!
Container container = event.getContainer();
ContainerId containerId = container.getContainerId();
switch (event.getType()) {
case LAUNCH_CONTAINER:
Application app =
context.getApplications().get(
containerId.getApplicationAttemptId().getApplicationId());
// LAUNCH_CONTAINER 事件的处理逻辑,创建 ContainerLaunch 线程并启动线程
ContainerLaunch launch =
new ContainerLaunch(context, getConfig(), dispatcher, exec, app,
event.getContainer(), dirsHandler, containerManager);
containerLauncher.submit(launch);
// 将其加入到运行的 Container 数据结构 running 中
running.put(containerId, launch);
break;
case RECOVER_CONTAINER: // 省略
case CLEANUP_CONTAINER: //省略
}
}
这里的 ContainerLaunch 类是真正启动 Container 的类,ContainerLaunch 类继承自 Callable 类,线程启动的方式是通过 submit() 方法提交,调用 Callable 类的实现方法 call() 来真正执行线程。启动过程主要做了三件事:
- 准备 Container 的执行环境;
- 更新 Container 状态,从 LOCALIZED 转换为 RUNNING;
- 调用 ContainerExecutor 对象在 NM 节点上启动 Container
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainerLaunch.java
public Integer call() {
final ContainerLaunchContext launchContext = container.getLaunchContext();
/**
* 启动 Container 前的准备工作:(省略)
* 1.shell启动脚本的封装与拓展(添加自定义脚本)
* 2.创建本地工作目录
* 3.设置token的保存路径
*/
try {
// 由于 call() 方法调用是阻塞的,这里先发送 ContainerEventType.CONTAINER_LAUNCHED 事件,将 Container 状态从LOCALIZED 转换为 RUNNING
dispatcher.getEventHandler().handle(new ContainerEvent(
containerID,
ContainerEventType.CONTAINER_LAUNCHED));
context.getNMStateStore().storeContainerLaunched(containerID);
// Check if the container is signalled to be killed.
if (!shouldLaunchContainer.compareAndSet(false, true)) {
LOG.info("Container " + containerIdStr + " not launched as "
+ "cleanup already called");
ret = ExitCode.TERMINATED.getExitCode();
}
else {
// 重点:调用 ContainerExecutor 对象启动 Contianer
exec.activateContainer(containerID, pidFilePath);
ret = exec.launchContainer(container, nmPrivateContainerScriptPath,
nmPrivateTokensPath, user, appIdStr, containerWorkDir,
localDirs, logDirs);
}
}
// Container 执行结果返回,判断是否成功执行(省略)
LOG.info("Container " + containerIdStr + " succeeded ");
dispatcher.getEventHandler().handle(
new ContainerEvent(containerID,
ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS));
return 0;
}
Container 的运行环境已经准备好,接下来就是真正在 NM 上真正启动 Container 的过程,具体启动是调用 ContainerExecutor#launchContainer() 方法。运行 Container 是由插拔式组件 ContainerExecutor 完成,Yarn 中提供了三种 ContainerExecutor 实现,一种是 DefaultContainerExecutor,一种是 LinuxContainerExecutor,另一种是 DockerContainerExecutor,由参数 yarn.nodemanager.container-executor.class 控制其具体使用方式。
2.4 Container 资源清理
Container 资源清理是指 Container 运行完成后(可能成功或者失败),NM 需回收它占用的资源,这些资源主要是 Container 运行时使用的临时文件,主要来源是 ResourceLocalizationService 和 ContianerExecutor 两个服务/组件,其中 ResourceLocalizationService 将数据 HDFS 文件下载到本地,ContainerExecutor 为 Container 创建私有工作目录,并保存一些临时文件(比如 Container 进程 pid 文件)。因此,Container 资源清理过程主要是通知这两个组件删除临时目录。
从 ContainerLaunch#call() 方法结束处,当 Container 成功运行完成后,会向调度器发送 ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS 事件。该事件的注册状态转换如下,将 Container 状态 从 RUNNING 转换为 EXITED_WITH_SUCCESS,并触发状态机 ExitedWithSuccessTransition。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
// From RUNNING State
.addTransition(ContainerState.RUNNING,
ContainerState.EXITED_WITH_SUCCESS,
ContainerEventType.CONTAINER_EXITED_WITH_SUCCESS,
new ExitedWithSuccessTransition(true))
ExitedWithSuccessTransition 状态过程会发送 ContainersLauncherEventType.CLEANUP_CONTAINER 事件,该事件发送了两个事件:
-
向 ContainerLauncher 发送 ContainersLauncherEventType.CLEANUP_CONTAINER 清理事件;
-
向 ResourceLocalizationService 发送 LocalizationEventType.CLEANUP_CONTAINER_RESOURCES 清理事件。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/container/ContainerImpl.java
static class ExitedWithSuccessTransition extends ContainerTransition {
boolean clCleanupRequired;
public ExitedWithSuccessTransition(boolean clCleanupRequired) {
this.clCleanupRequired = clCleanupRequired;
}
@Override
public void transition(ContainerImpl container, ContainerEvent event) {
// Set exit code to 0 on success
container.exitCode = 0;
// TODO: Add containerWorkDir to the deletion service.
if (clCleanupRequired) {
// 向 ContainerLauncher 发送 ContainersLauncherEventType.CLEANUP_CONTAINER 清理事件
container.dispatcher.getEventHandler().handle(
new ContainersLauncherEvent(container,
ContainersLauncherEventType.CLEANUP_CONTAINER));
}
// 向 ResourceLocalizationService 发送 LocalizationEventType.CLEANUP_CONTAINER_RESOURCES 清理事件
container.cleanup();
}
}
先来看看 ContainerLauncher 清理临时目录的过程。ContainersLauncherEventType.CLEANUP_CONTAINER 事件的处理逻辑最终会进入到 ContainersLauncher 的 handle() 方法,将 Container 从正在运行的 Container 列表中移除,并调用 ContainerLaunch#cleanupContainer() 方法清除 Container 占用的临时目录。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/launcher/ContainersLauncher.java
public void handle(ContainersLauncherEvent event) {
// TODO: ContainersLauncher launches containers one by one!!
Container container = event.getContainer();
ContainerId containerId = container.getContainerId();
switch (event.getType()) {
case LAUNCH_CONTAINER: // 省略
case RECOVER_CONTAINER: // 省略
case CLEANUP_CONTAINER:
// 将 Container 从正在运行 Container 列表中移除
ContainerLaunch launcher = running.remove(containerId);
if (launcher == null) {
// Container not launched. So nothing needs to be done.
return;
}
// Cleanup a container whether it is running/killed/completed, so that
// no sub-processes are alive.
try {
// 清理 Container 占用的临时目录
launcher.cleanupContainer();
} catch (IOException e) {
LOG.warn("Got exception while cleaning container " + containerId
+ ". Ignoring.");
}
break;
}
}
再来看看 ResourceLocalizationService 清除 Container 用户工作目录和 NM 私有目录下的 Container 目录。根据发送的发送 LocalizationEventType.CLEANUP_CONTAINER_RESOURCES 清理事件,可以进入到对应的清理逻辑 handleCleanupContainerResources(),执行具体的清理逻辑。该逻辑将会删除用户工作 ${yarn.nodemanager.local-dirs}/usercache/<user>/appcache/${appid}/${containerid} 的数据(即从 HDFS 下载的数据),和 ${yarn.nodemanager.local-dirs}/nmPrivate/${appid}/${containerid} 私有目录数据,这两个目标都存放了 Tokens 文件和 Shell 运行脚本。
//位置:org/apache/hadoop/yarn/server/nodemanager/containermanager/localizer/ResourceLocalizationService.java
public void handle(LocalizationEvent event) {
switch (event.getType()) {
case INIT_APPLICATION_RESOURCES: // 省略
case INIT_CONTAINER_RESOURCES: // 省略
case CONTAINER_RESOURCES_LOCALIZED: // 省略
case CACHE_CLEANUP: // 省略
case CLEANUP_CONTAINER_RESOURCES:
handleCleanupContainerResources((ContainerLocalizationCleanupEvent)event);
break;
case DESTROY_APPLICATION_RESOURCES: //省略
default:
throw new YarnRuntimeException("Unknown localization event: " + event);
}
}
至此,Container 资源清理流程已完成。
【参考资料】
- 董西成. 《Hadoop技术内幕 · 深入解析 YARN 架构设计与实现原理》
- https://blog.csdn.net/gaopenghigh/article/details/45507765
- http://dongxicheng.org/mapreduce-nextgen/yarnmrv2-node-manager-container-setup-process/
转载:https://www.cnblogs.com/lemonu/p/13901050.html


浙公网安备 33010602011771号