Yarn ApplicationMaster源码分析
本文主要介绍 ApplicationMaster 的运行流程,并从 ApplicationMaster 的启动、注册/心跳、Container 资源申请与分配三个角度分析相关源码。其中花了大量篇幅介绍 ApplicationMaster 的启动过程,包括任务提交流程、App/Attempt 转换过程,到 ApplicationMaster 的启动,这部分主要是方便读者了解从应用程序提交到启动 ApplicationMaster 启动整个过程,对 Yarn 的提交流程有更深入的理解。
一、ApplicationMaster 整体运行流程

ApplicationMaster 生命周期
ApplicationMaster 管理主要由三个服务构成,分别是 ApplicationMasterLauncher、AMLivelinessMonitor 和 ApplicationMasterService,它们共同管理应用程序的 ApplicationMaster 的生命周期。ApplicationMaster 服务从创建到销毁的流程如下:
- 用户向 ResourceManager 提交应用程序,ResourceManager 收到提交请求后,先向资源调度器申请用以启动 ApplicationMaster 的资源,待申请到资源后,再由 ApplicationMasterLauncher 与对应的 NodeManager 通信,从而启动应用程序的 ApplicationMaster。
- ApplicationMaster 启动完成后,ApplicationMasterLauncher 会通过事件的形式,将刚刚启动的 ApplicationMaster 注册到 AMLivelinessMonitor,以启动心跳监控。
- ApplicationMaster 启动后,先向 ApplicationMasterService 注册,将自己所在 host、端口号等信息汇报给它。
- ApplicationMaster 运行过程中,周期性地向 ApplicationMasterService 汇报“心跳”信息(“心跳”信息中包含想要申请的资源描述)。
- ApplicationMasterService 每次收到 ApplicationMaster 的心跳信息后,将通知 AMLivelinessMonitor 更新该应用程序的最近汇报心跳的时间。
- 当应用程序运行完成后,ApplicationMaster 向 ApplicationMasterService 发送请求,注销自己。
- ApplicationMasterService 收到注销请求后,标注应用程序运行状态为完成,同时通知 AMLivelinessMonitor 移除对它的心跳监控。
结合 ApplicationMaster 的整体生命周期,我们从 ApplicatioMaster 启动、注册/心跳及资源申请三个角度来剖析相关源码。
二、ApplicationMaster 启动流程
这部分主要介绍 ApplicationMaster 生命周期的第一步,即 ApplicationMaster 的启动。为了方便理解整个任务执行流程,我们不直接分析 ApplicationMaster 的启动类,而是从应用程序提交,到 APP/Attempt 状态转换(ApplicationMaster 启动前应用程序的一些状态转换过程),再到具体的 ApplicationMaster 启动,以对 Yarn 的整个任务提交流程有更深的了解。
2.1 应用程序提交
不管是什么类型的应用程序,提交到 Yarn 上的入口,都是通过 YarnClient 这个接口 api 提交的,具体提交方法为 submitApplication()。
//位置:org/apache/hadoop/yarn/client/api/YarnClient.java
public abstract ApplicationId submitApplication(
ApplicationSubmissionContext appContext) throws YarnException,
IOException;
看看其实现类的提交入口:
//位置:org/apache/hadoop/yarn/client/api/impl/YarnClientImpl.java
@Override
public ApplicationId
submitApplication(ApplicationSubmissionContext appContext)
throws YarnException, IOException {
ApplicationId applicationId = appContext.getApplicationId();
if (applicationId == null) {
throw new ApplicationIdNotProvidedException(
"ApplicationId is not provided in ApplicationSubmissionContext");
}
// 构建应用程序请求的上文文信息
SubmitApplicationRequest request =
Records.newRecord(SubmitApplicationRequest.class);
request.setApplicationSubmissionContext(appContext);
// Automatically add the timeline DT into the CLC
// Only when the security and the timeline service are both enabled
if (isSecurityEnabled() && timelineServiceEnabled) {
addTimelineDelegationToken(appContext.getAMContainerSpec());
}
// Client 真正提交应用程序
rmClient.submitApplication(request);
while (true) {
// 对未能及时提交的应用程序不断重试
}
return applicationId;
}
Yarn Client 与 RM 进行 RPC 通信是通过 ClientRMService 服务实现的,应用程序提交到服务端,会调用 RMAppManager 类的对应方法来处理应用程序。
//位置:org/apache/hadoop/yarn/server/resourcemanager/ClientRMService.java
@Override
public SubmitApplicationResponse submitApplication(
SubmitApplicationRequest request) throws YarnException {
ApplicationSubmissionContext submissionContext = request
.getApplicationSubmissionContext();
ApplicationId applicationId = submissionContext.getApplicationId();
// 跳过神圣的检查工作
try {
// 重点:调用 RMAppManager 来提交应用程序
rmAppManager.submitApplication(submissionContext,
System.currentTimeMillis(), user);
LOG.info("Application with id " + applicationId.getId() +
" submitted by user " + user);
RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
"ClientRMService", applicationId);
} catch (YarnException e) {
LOG.info("Exception in submitting application with id " +
applicationId.getId(), e);
RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
e.getMessage(), "ClientRMService",
"Exception in submitting application", applicationId);
throw e;
}
SubmitApplicationResponse response = recordFactory
.newRecordInstance(SubmitApplicationResponse.class);
return response;
}
2.2 APP/AppAttempt 状态转换过程
从 RMAppManager 类的 rmAppManager.submitApplication() 方法,可以看到它向调度器发送 RMAppEventType.START 事件。
//位置:src/main/java/org/apache/hadoop/yarn/server/resourcemanager/RMAppManager.java
protected void submitApplication(
ApplicationSubmissionContext submissionContext, long submitTime,
String user) throws YarnException {
ApplicationId applicationId = submissionContext.getApplicationId();
RMAppImpl application =
createAndPopulateNewRMApp(submissionContext, submitTime, user, false);
ApplicationId appId = submissionContext.getApplicationId();
Credentials credentials = null;
try {
credentials = parseCredentials(submissionContext);
if (UserGroupInformation.isSecurityEnabled()) {
this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId,
credentials, submissionContext.getCancelTokensWhenComplete(),
application.getUser());
} else {
// 重点:向调度器发送 RMAppEventType.START 事件
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.START));
}
} catch (Exception e) {
LOG.warn("Unable to parse credentials.", e);
// Sending APP_REJECTED is fine, since we assume that the
// RMApp is in NEW state and thus we haven't yet informed the
// scheduler about the existence of the application
assert application.getState() == RMAppState.NEW;
this.rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, e.getMessage()));
throw RPCUtil.getRemoteException(e);
}
}
RMAppEventType.START 事件在 RMAppImpl 类中有对应的状态转换,即 APP 状态从 NEW 转换为 NEW_SAVING。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
// Transitions from NEW state
.addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
RMAppEventType.START, new RMAppNewlySavingTransition())
注册的 RMAppNewlySavingTransition 状态机做了什么呢?
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
private static final class RMAppNewlySavingTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
// 保存 APP 的状态信息
LOG.info("Storing application with id " + app.applicationId);
app.rmContext.getStateStore().storeNewApplication(app);
}
}
状态机会对 APP 的状态进行保存,将其元数据存储到 ZK 中。
//位置:org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
public void storeNewApplication(RMApp app) {
ApplicationSubmissionContext context = app
.getApplicationSubmissionContext();
assert context instanceof ApplicationSubmissionContextPBImpl;
ApplicationStateData appState =
ApplicationStateData.newInstance(
app.getSubmitTime(), app.getStartTime(), context, app.getUser());
// 向调度器发送 RMStateStoreEventType.STORE_APP 事件
dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));
}
这里向调度器发送 RMStateStoreEventType.STORE_APP 事件,并注册了 StoreAppTransition 状态机。
//位置:org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
.addTransition(RMStateStoreState.ACTIVE,
EnumSet.of(RMStateStoreState.ACTIVE, RMStateStoreState.FENCED),
RMStateStoreEventType.STORE_APP, new StoreAppTransition())
StoreAppTransition 状态机会向调度器发送 RMAppEventType.APP_NEW_SAVED 事件,触发 APP 状态从 NEW_SAVING 到 SUBMITED 的转换。
//位置:org/apache/hadoop/yarn/server/resourcemanager/recovery/RMStateStore.java
private static class StoreAppTransition
implements MultipleArcTransition<RMStateStore, RMStateStoreEvent,
RMStateStoreState> {
@Override
public RMStateStoreState transition(RMStateStore store,
RMStateStoreEvent event) {
if (!(event instanceof RMStateStoreAppEvent)) {
// should never happen
LOG.error("Illegal event type: " + event.getClass());
return RMStateStoreState.ACTIVE;
}
boolean isFenced = false;
ApplicationStateData appState =
((RMStateStoreAppEvent) event).getAppState();
ApplicationId appId =
appState.getApplicationSubmissionContext().getApplicationId();
LOG.info("Storing info for app: " + appId);
try {
store.storeApplicationStateInternal(appId, appState);
// 重点:向调度器发送 RMAppEventType.APP_NEW_SAVED 事件
store.notifyApplication(new RMAppEvent(appId,
RMAppEventType.APP_NEW_SAVED));
} catch (Exception e) {
LOG.error("Error storing app: " + appId, e);
isFenced = store.notifyStoreOperationFailedInternal(e);
}
return finalState(isFenced);
};
}
这里会向调度器发送 RMAppEventType.APP_NEW_SAVED 事件,该事件会触发 APP 状态从 NEW_SAVING 到 SUBMITED 的转换,并调用 AddApplicationToSchedulerTransition 状态机。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
.addTransition(RMAppState.NEW_SAVING, RMAppState.SUBMITTED,
RMAppEventType.APP_NEW_SAVED, new AddApplicationToSchedulerTransition())
AddApplicationToSchedulerTransition 状态机会触发 SchedulerEventType.APP_ADDED 事件。
//位置:org/apache/hadoop/yarn/server/resourcemanager/scheduler/event/AppAddedSchedulerEvent.java
private static final class AddApplicationToSchedulerTransition extends
RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
// 向调度器发送 SchedulerEventType.APP_ADDED 事件
app.handler.handle(new AppAddedSchedulerEvent(app.applicationId,
app.submissionContext.getQueue(), app.user,
app.submissionContext.getReservationID()));
}
}
其中AppAddedSchedulerEvent 类继承自 SchedulerEvent 类,事件的处理会进入到 FairScheduler 类,来看看对应的 handle() 方法。
//位置:org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
@Override
public void handle(SchedulerEvent event) {
switch (event.getType()) {
case NODE_ADDED: // 省略
case NODE_REMOVED: // 省略
case NODE_UPDATE: // 省略
case APP_ADDED:
if (!(event instanceof AppAddedSchedulerEvent)) {
throw new RuntimeException("Unexpected event type: " + event);
}
AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
// APP_ADDED 事件处理逻辑
addApplication(appAddedEvent.getApplicationId(),
appAddedEvent.getQueue(), appAddedEvent.getUser(),
appAddedEvent.getIsAppRecovering());
break;
case APP_REMOVED: // 省略
case NODE_RESOURCE_UPDATE: // 省略
case APP_ATTEMPT_ADDED: // 省略
case APP_ATTEMPT_REMOVED: // 省略
case CONTAINER_EXPIRED: // 省略
case CONTAINER_RESCHEDULED: // 省略
default:
LOG.error("Unknown event arrived at FairScheduler: " + event.toString());
}
}
addApplication() 方法会对应用程序的提交进行一些前期检查工作,比如队列名是否正确、用户是否有队列访问权限等,检查通过后,会向调度器发送 RMAppEventType.APP_ACCEPTED 事件。
//位置:org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
protected synchronized void addApplication(ApplicationId applicationId,
String queueName, String user, boolean isAppRecovering) {
// 提交队列信息判断
if (queueName == null || queueName.isEmpty()) {
String message = "Reject application " + applicationId +
" submitted by user " + user + " with an empty queue name.";
LOG.info(message);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, message));
return;
}
if (queueName.startsWith(".") || queueName.endsWith(".")) {
String message = "Reject application " + applicationId
+ " submitted by user " + user + " with an illegal queue name "
+ queueName + ". "
+ "The queue name cannot start/end with period.";
LOG.info(message);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, message));
return;
}
RMApp rmApp = rmContext.getRMApps().get(applicationId);
FSLeafQueue queue = assignToQueue(rmApp, queueName, user);
if (queue == null) {
return;
}
// 队列的 ACL 访问权限判断
UserGroupInformation userUgi = UserGroupInformation.createRemoteUser(user);
if (!queue.hasAccess(QueueACL.SUBMIT_APPLICATIONS, userUgi)
&& !queue.hasAccess(QueueACL.ADMINISTER_QUEUE, userUgi)) {
String msg = "User " + userUgi.getUserName() +
" cannot submit applications to queue " + queue.getName();
LOG.info(msg);
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppRejectedEvent(applicationId, msg));
return;
}
SchedulerApplication<FSAppAttempt> application =
new SchedulerApplication<FSAppAttempt>(queue, user);
applications.put(applicationId, application);
queue.getMetrics().submitApp(user);
LOG.info("Accepted application " + applicationId + " from user: " + user
+ ", in queue: " + queue.getName()
+ ", currently num of applications: " + applications.size());
if (isAppRecovering) {
// 判断 APP 是否事 Recover 状态(暂时不考虑 Recover 情况)
if (LOG.isDebugEnabled()) {
LOG.debug(applicationId
+ " is recovering. Skip notifying APP_ACCEPTED");
}
} else {
// 重点:向调度器发送 RMAppEventType.APP_ACCEPTED 事件
rmContext.getDispatcher().getEventHandler()
.handle(new RMAppEvent(applicationId, RMAppEventType.APP_ACCEPTED));
}
}
RMAppEventType.APP_ACCEPTED 事件的注册,会触发 StartAppAttemptTransition 状态机,并将 APP 的状态从 SUBMITED 转换为 ACCEPTED。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
// Transitions from SUBMITTED state
.addTransition(RMAppState.SUBMITTED, RMAppState.ACCEPTED,
RMAppEventType.APP_ACCEPTED, new StartAppAttemptTransition())
StartAppAttemptTransition 状态机会发送 RMAppAttemptEventType.START 事件,以开始启动 AppAttempt。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/RMAppImpl.java
private static final class StartAppAttemptTransition extends RMAppTransition {
@Override
public void transition(RMAppImpl app, RMAppEvent event) {
app.createAndStartNewAttempt(false);
};
}
// 开始启动 AppAttempt
private void createAndStartNewAttempt(boolean transferStateFromPreviousAttempt) {
createNewAttempt();
// 向调度器发送 RMAppAttemptEventType.START 事件
handler.handle(new RMAppStartAttemptEvent(currentAttempt.getAppAttemptId(),
transferStateFromPreviousAttempt));
}
RMAppAttemptEventType.START 事件的注册,会调用 AttemptStartedTransition 状态机,触发 AppAttempt 状态从 NEW 转变为 SUBMITED。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
// Transitions from NEW State
.addTransition(RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED,
RMAppAttemptEventType.START, new AttemptStartedTransition())
AttemptStartedTransition 状态机会触发 AppAttemptAddedSchedulerEvent 事件,发送 SchedulerEventType.APP_ATTEMPT_ADDED 请求。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
private static final class AttemptStartedTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
// 跳过一些神圣的检查工作
// 向调度器发送 SchedulerEventType.APP_ATTEMPT_ADDED 事件
appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
appAttempt.applicationAttemptId, transferStateFromPreviousAttempt));
}
}
AppAttemptAddedSchedulerEvent 类继承自 SchedulerEvent 类,进入具体代码看看 SchedulerEventType.APP_ATTEMPT_ADDED 事件的处理逻辑,还是在 handle() 方法。
//位置:org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
public void handle(SchedulerEvent event) {
switch (event.getType()) {
case NODE_ADDED: // 省略
case NODE_REMOVED: // 省略
case NODE_UPDATE: // 省略
case APP_ADDED: // 省略
case APP_REMOVED: // 省略
case NODE_RESOURCE_UPDATE: // 省略
case APP_ATTEMPT_ADDED:
if (!(event instanceof AppAttemptAddedSchedulerEvent)) {
throw new RuntimeException("Unexpected event type: " + event);
}
AppAttemptAddedSchedulerEvent appAttemptAddedEvent =
(AppAttemptAddedSchedulerEvent) event;
addApplicationAttempt(appAttemptAddedEvent.getApplicationAttemptId(),
appAttemptAddedEvent.getTransferStateFromPreviousAttempt(),
appAttemptAddedEvent.getIsAttemptRecovering());
break;
case APP_ATTEMPT_REMOVED: // 省略
case CONTAINER_EXPIRED: // 省略
case CONTAINER_RESCHEDULED: // 省略
default:
LOG.error("Unknown event arrived at FairScheduler: " + event.toString());
}
}
addApplicationAttempt() 方法会调度器发送 RMAppAttemptEventType.ATTEMPT_ADDED 事件。
//位置:org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
protected synchronized void addApplicationAttempt(
ApplicationAttemptId applicationAttemptId,
boolean transferStateFromPreviousAttempt,
boolean isAttemptRecovering) {
// 跳过前期的检查和初始化工作
if (isAttemptRecovering) {
if (LOG.isDebugEnabled()) {
LOG.debug(applicationAttemptId
+ " is recovering. Skipping notifying ATTEMPT_ADDED");
}
} else {
// 向调度器发送 RMAppAttemptEventType.ATTEMPT_ADDED 事件
rmContext.getDispatcher().getEventHandler().handle(
new RMAppAttemptEvent(applicationAttemptId,
RMAppAttemptEventType.ATTEMPT_ADDED));
}
}
RMAppAttemptEventType.ATTEMPT_ADDED 注册,并触发 ScheduleTransition 状态机,将 AppAttempt 状态从 SUBMITED 转变为 LAUNCHED_UNMANAGED_SAVING 或者 SCHEDULED。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
// Transitions from SUBMITTED state
.addTransition(RMAppAttemptState.SUBMITTED,
EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.ATTEMPT_ADDED,
new ScheduleTransition())
看看 ScheduleTransition 状态机,if 语句开关判断是否应该获取管理 AM 的执行,如果为 true,则 RM 不会为 AM 分配一个容器并启动,默认是 false,所以这里返回的状态是 SCHEDULED。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
public static final class ScheduleTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
ApplicationSubmissionContext subCtx = appAttempt.submissionContext;
if (!subCtx.getUnmanagedAM()) {
// 跳过一部分操作
// 分配 Container,这里暂不做解释
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(
appAttempt.applicationAttemptId,
appAttempt.amReqs,
EMPTY_CONTAINER_RELEASE_LIST,
amBlacklist.getAdditions(),
amBlacklist.getRemovals());
if (amContainerAllocation != null
&& amContainerAllocation.getContainers() != null) {
assert (amContainerAllocation.getContainers().size() == 0);
}
// 返回的状态也会进行状态机转换
return RMAppAttemptState.SCHEDULED;
} else {
// save state and then go to LAUNCHED state
appAttempt.storeAttempt();
return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
}
}
}
RMAppAttemptState.SCHEDULED 状态,会触发 RMAppAttemptEventType.CONTAINER_ALLOCATED 事件,使得 AppAttempt 状态从 SCHEDULED 转换到 ALLOCATED_SAVING,对应的处理状态机为 AMContainerAllocatedTransition。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
// Transitions from SCHEDULED State
.addTransition(RMAppAttemptState.SCHEDULED,
EnumSet.of(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.SCHEDULED),
RMAppAttemptEventType.CONTAINER_ALLOCATED,
new AMContainerAllocatedTransition())
AMContainerAllocatedTransition 状态机主要是 AM 获取分配的资源,并发送 RMAppAttemptState.ALLOCATED_SAVING 事件。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
private static final class AMContainerAllocatedTransition
implements
MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
@Override
public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
RMAppAttemptEvent event) {
// 从调度器获取启动 AM 的 Container,这里的 allocate 并没有传入 AM 请求信息,表示先尝试直接获取 Container
Allocation amContainerAllocation =
appAttempt.scheduler.allocate(appAttempt.applicationAttemptId,
EMPTY_CONTAINER_REQUEST_LIST, EMPTY_CONTAINER_RELEASE_LIST, null,
null);
// 对 AM 资源进行判空处理,如果没有获取到之前分配的资源,在这里重新进行分配
if (amContainerAllocation.getContainers().size() == 0) {
appAttempt.retryFetchingAMContainer(appAttempt);
return RMAppAttemptState.SCHEDULED;
}
// Set the masterContainer
appAttempt.setMasterContainer(amContainerAllocation.getContainers()
.get(0));
RMContainerImpl rmMasterContainer = (RMContainerImpl)appAttempt.scheduler
.getRMContainer(appAttempt.getMasterContainer().getId());
rmMasterContainer.setAMContainer(true);
appAttempt.rmContext.getNMTokenSecretManager()
.clearNodeSetForAttempt(appAttempt.applicationAttemptId);
appAttempt.getSubmissionContext().setResource(
appAttempt.getMasterContainer().getResource());
appAttempt.storeAttempt();
// 向调度器发送 RMAppAttemptState.ALLOCATED_SAVING 事件
return RMAppAttemptState.ALLOCATED_SAVING;
}
}
RMAppAttemptState.ALLOCATED_SAVING 事件的注册状态机为 AttemptStoredTransition,此时 AppAttempt 状态已从 ALLOCATED_SAVING 转换为 ALLOCATED。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
// Transitions from ALLOCATED_SAVING State
.addTransition(RMAppAttemptState.ALLOCATED_SAVING,
RMAppAttemptState.ALLOCATED,
RMAppAttemptEventType.ATTEMPT_NEW_SAVED, new AttemptStoredTransition())
我们接着看 AttemptStoredTransition 状态机做了什么。
//位置:org/apache/hadoop/yarn/server/resourcemanager/rmapp/attempt/RMAppAttemptImpl.java
private static final class AttemptStoredTransition extends BaseTransition {
@Override
public void transition(RMAppAttemptImpl appAttempt, RMAppAttemptEvent event) {
// 运行 AppAttempt
appAttempt.launchAttempt();
}
}
private void launchAttempt(){
launchAMStartTime = System.currentTimeMillis();
// 重点:发送 AMLauncherEventType.LAUNCH 事件启动 AM Container
eventHandler.handle(new AMLauncherEvent(AMLauncherEventType.LAUNCH, this));
}
至此,终于看到了 AM Container 启动的曙光了,可具体是怎么启动的呢?我们接着分析。
2.3 启动 AM
上面的发送的 AMLauncherEventType.LAUNCH 事件是启动 AM 的关键入口,可由谁来处理这个事件呢?这就需要进入到 ApplicationMasterLauncher 类来分析了,我们先来看看这个类的基本属性。
//位置:org/apache/hadoop/yarn/server/resourcemanager/amlauncher/ApplicationMasterLauncher.java
public class ApplicationMasterLauncher extends AbstractService implements
EventHandler<AMLauncherEvent> {
// 创建线程池实例,针对处理的每一个 AM 事件都启动一个线程
private ThreadPoolExecutor launcherPool;
// 独立线程处理 AM 的 LAUNCH 和 CLEANUP 事件
private LauncherThread launcherHandlingThread;
// 事件接收和处理的队列
private final BlockingQueue<Runnable> masterEvents
= new LinkedBlockingQueue<Runnable>();
// 资源管理器上下文
protected final RMContext context;
public ApplicationMasterLauncher(RMContext context) {
super(ApplicationMasterLauncher.class.getName());
this.context = context;
// 新建事件处理的线程
this.launcherHandlingThread = new LauncherThread();
}
@Override
protected void serviceInit(Configuration conf) throws Exception {
int threadCount = conf.getInt(
YarnConfiguration.RM_AMLAUNCHER_THREAD_COUNT,
YarnConfiguration.DEFAULT_RM_AMLAUNCHER_THREAD_COUNT);
ThreadFactory tf = new ThreadFactoryBuilder()
.setNameFormat("ApplicationMasterLauncher #%d")
.build();
// 初始化线程池
launcherPool = new ThreadPoolExecutor(threadCount, threadCount, 1,
TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
launcherPool.setThreadFactory(tf);
// 跳过一些配置初始化操作
}
}
这里主要是创建一些执行环境,包括事件处理的独立线程 launcherHandlingThread、所需的线程池 launcherPool 及一个负责接收和处理 AM 事件的 masterEvents 事件队列。而 ApplicationMasterLauncher 类中主要处理 AM 的两种事件:LAUNCH 和 CLEANUP。
//位置:org/apache/hadoop/yarn/server/resourcemanager/amlauncher/ApplicationMasterLauncher.java
@Override
public synchronized void handle(AMLauncherEvent appEvent) {
// 获取 AMLauncherEvent 的事件类型
AMLauncherEventType event = appEvent.getType();
RMAppAttempt application = appEvent.getAppAttempt();
switch (event) {
// 处理 AM LAUNCH 事件
case LAUNCH:
launch(application);
break;
// 处理 AM CLEANUP 事件
case CLEANUP:
cleanup(application);
default:
break;
}
}
上面 2.2 小节中最后发送的 AMLauncherEventType.LAUNCH 事件正是在这里处理的,我们就以 LAUNCH 事件为例来看看具体的处理逻辑。
//位置:org/apache/hadoop/yarn/server/resourcemanager/amlauncher/ApplicationMasterLauncher.java
private void launch(RMAppAttempt application) {
// 创建一个 AMLauncher 实例,AMLauncher 继承自 Runnable
Runnable launcher = createRunnableLauncher(application,
AMLauncherEventType.LAUNCH);
// 将事件添加到 masterEvents 队列中
masterEvents.add(launcher);
}
protected Runnable createRunnableLauncher(RMAppAttempt application,
AMLauncherEventType event) {
Runnable launcher =
new AMLauncher(context, application, event, getConfig());
return launcher;
}
事件被加入到事件队列之后,是如何被处理的呢?这里就是独立线程 launcherHandlingThread 所做的事了,通过消息队列的形式,在线程中逐一被消费处理。
//位置:org/apache/hadoop/yarn/server/resourcemanager/amlauncher/ApplicationMasterLauncher.java
private class LauncherThread extends Thread {
public LauncherThread() {
super("ApplicationMaster Launcher");
}
@Override
public void run() {
while (!this.isInterrupted()) { // 死循环不停地处理事件请求
Runnable toLaunch;
try {
// 从事件队列中取出事件
toLaunch = masterEvents.take();
// 从线程池中取出一个线程执行事件请求
launcherPool.execute(toLaunch);
} catch (InterruptedException e) {
LOG.warn(this.getClass().getName() + " interrupted. Returning.");
return;
}
}
}
}
取出事件后具体的执行逻辑就交给 AMLaunch 类了,这里的 AMLaunch 类本身就是一个 Runnable 实例,我们直接看其 run() 方法。
//位置:org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
public void run() {
switch (eventType) {
case LAUNCH:
try {
LOG.info("Launching master" + application.getAppAttemptId());
// 启动 launch() 方法
launch();
// 发送 RMAppAttemptEventType.LAUNCHED 事件
handler.handle(new RMAppAttemptEvent(application.getAppAttemptId(),
RMAppAttemptEventType.LAUNCHED));
} catch(Exception ie) {
String message = "Error launching " + application.getAppAttemptId()
+ ". Got exception: " + StringUtils.stringifyException(ie);
LOG.info(message);
handler.handle(new RMAppAttemptLaunchFailedEvent(application
.getAppAttemptId(), message));
}
break;
case CLEANUP: // 省略
default:
LOG.warn("Received unknown event-type " + eventType + ". Ignoring.");
break;
}
}
AMLaunch 类的 launch() 方法操作会调用 RPC 函数与 NodeManager 通信,来启动 AM Container,这里 AM 与 NM 交互是通过 ContainerManagementProtocol 协议来实现 RPC 调用的。launch() 方法运行完成后会向调度器发送 RMAppAttemptEventType.LAUNCHED 事件,并将 AppAttempt 的状态从 ALLOCATED 转换为 LAUNCHED。
//位置:org/apache/hadoop/yarn/server/resourcemanager/amlauncher/AMLauncher.java
private void launch() throws IOException, YarnException {
connect();
ContainerId masterContainerID = masterContainer.getId();
ApplicationSubmissionContext applicationContext =
application.getSubmissionContext();
LOG.info("Setting up container " + masterContainer
+ " for AM " + application.getAppAttemptId());
ContainerLaunchContext launchContext =
createAMContainerLaunchContext(applicationContext, masterContainerID);
// 构建 Container 请求信息
StartContainerRequest scRequest =
StartContainerRequest.newInstance(launchContext,
masterContainer.getContainerToken());
List<StartContainerRequest> list = new ArrayList<StartContainerRequest>();
list.add(scRequest);
StartContainersRequest allRequests =
StartContainersRequest.newInstance(list);
// 重点:调用 RPC 函数启动 Container
StartContainersResponse response =
containerMgrProxy.startContainers(allRequests);
if (response.getFailedRequests() != null
&& response.getFailedRequests().containsKey(masterContainerID)) {
Throwable t =
response.getFailedRequests().get(masterContainerID).deSerialize();
parseAndThrowException(t);
} else {
LOG.info("Done launching container " + masterContainer + " for AM "
+ application.getAppAttemptId());
}
}
至此,用于运行 ApplicationMaster 的 Container 已经启动,具体的 Container 启动逻辑在这里不做分析,AM Container 在具体的 NodeManager 上启动后,Container 会根据上下文信息启动 ApplicationMaster 进程,ApplicationMaster 生命周期的第一步 ApplicationMaster 启动在这里已经完成了。
三、ApplicationMaster 注册/心跳及资源申请流程
这部分主要介绍 ApplicationMaster 启动是做了哪些工作,如何向 ResourceManager 进行注册和心跳,以及如何申请 Container 资源。
3.1 ApplicationMaster 注册/心跳流程
Container 的启动会触发 ApplicationMaster 进程的启动,于是我们从 ApplicationMaster 类的 main() 方法作为入口,来看看 ApplicationMaster 启动时做了哪些工作。
//位置:org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
public static void main(String[] args) {
boolean result = false;
try {
ApplicationMaster appMaster = new ApplicationMaster();
LOG.info("Initializing ApplicationMaster");
boolean doRun = appMaster.init(args);
if (!doRun) {
System.exit(0);
}
// AM 启动的核心 run() 方法
appMaster.run();
result = appMaster.finish();
} catch (Throwable t) {
LOG.fatal("Error running ApplicationMaster", t);
LogManager.shutdown();
ExitUtil.terminate(1, t);
}
if (result) {
LOG.info("Application Master completed successfully. exiting");
System.exit(0);
} else {
LOG.info("Application Master failed. exiting");
System.exit(2);
}
}
run() 方法是 AM 启动的核心入口方法。这里主要是初始化相关 RPC 客户端实例,并开始向 RM 进行注册和心跳。
//位置:org/apache/hadoop/yarn/applications/distributedshell/ApplicationMaster.java
public void run() throws YarnException, IOException {
LOG.info("Starting ApplicationMaster");
// 跳过 tokens 的检查工作
// 初始化 AMRMClientAsync 实例,用于 AM 与 RM 之间进行交互
AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler();
amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
amRMClient.init(conf);
amRMClient.start();
// 初始化 NMClientAsync 实例,用于 AM 与 NM 之间进行交互
containerListener = createNMCallbackHandler();
nmClientAsync = new NMClientAsyncImpl(containerListener);
nmClientAsync.init(conf);
nmClientAsync.start();
// 重点:AM 向 RM 进行注册,这里也会向 RM 发送心跳请求
appMasterHostname = NetUtils.getHostname();
RegisterApplicationMasterResponse response = amRMClient
.registerApplicationMaster(appMasterHostname, appMasterRpcPort,
appMasterTrackingUrl);
// 跳过资源限制检查及 Container 状态的记录过程
}
AM 与 RM 进行 RPC 通信是通过 ApplicationMasterService 服务实现的,在看服务端 registerApplicationMaster 注册函数前,先来看看客户端的注册函数。
//位置:org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
public RegisterApplicationMasterResponse registerApplicationMaster(
String appHostName, int appHostPort, String appTrackingUrl)
throws YarnException, IOException {
// AM 注册
RegisterApplicationMasterResponse response = client
.registerApplicationMaster(appHostName, appHostPort, appTrackingUrl);
// 启动 AM 心跳上报线程
heartbeatThread.start();
return response;
}
先来看看 AM 注册过程。注册时做了两件事,一个是更新 AM 在 AMLivelinessMonitor 中的最新事件,另一个是发送 RMAppAttemptEventType.REGISTERED 事件,触发 AMRegisteredTransition 状态机,并将 AppAttempt 状态从 LAUNCHED 转换为 RUNNING。
//位置:org/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
@Override
public RegisterApplicationMasterResponse registerApplicationMaster(
RegisterApplicationMasterRequest request) throws YarnException,
IOException {
//省略
// Allow only one thread in AM to do registerApp at a time.
synchronized (lock) {
// 省略
// 更新 AM 在 AMLivelinessMonitor 中最近汇报心跳的事件
this.amLivelinessMonitor.receivedPing(applicationAttemptId);
RMApp app = this.rmContext.getRMApps().get(appID);
// Setting the response id to 0 to identify if the
// application master is register for the respective attemptid
lastResponse.setResponseId(0);
lock.setAllocateResponse(lastResponse);
// AM 注册关键逻辑,发送 RMAppAttemptEventType.REGISTERED 事件
LOG.info("AM registration " + applicationAttemptId);
this.rmContext
.getDispatcher()
.getEventHandler()
.handle(
new RMAppAttemptRegistrationEvent(applicationAttemptId, request
.getHost(), request.getRpcPort(), request.getTrackingUrl()));
RMAuditLogger.logSuccess(app.getUser(), AuditConstants.REGISTER_AM,
"ApplicationMasterService", appID, applicationAttemptId);
// 省略
}
}
接着来看看 AM 心跳上报流程。heartbeatThread 线程是处理 AM 的独立线程,其初始化过程如下。
//位置:org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
public class AMRMClientAsyncImpl<T extends ContainerRequest>
extends AMRMClientAsync<T> {
// AM 心跳线程对象
private final HeartbeatThread heartbeatThread;
@Private
@VisibleForTesting
public AMRMClientAsyncImpl(AMRMClient<T> client, int intervalMs,
CallbackHandler callbackHandler) {
super(client, intervalMs, callbackHandler);
// 初始化 AM 心跳线程实例
heartbeatThread = new HeartbeatThread();
}
AM 向 RM 注册后,周期性地通过 RPC 函数 ApplicationMasterProtocol#allocate() 方法与 RM 通信,该方法主要有以下是三个作用:
- 请求申请;
- 获取新分配地资源;
- 形成周期性心跳,告诉 RM 自己还活着。
//位置:org/apache/hadoop/yarn/client/api/async/impl/AMRMClientAsyncImpl.java
private class HeartbeatThread extends Thread {
public HeartbeatThread() {
super("AMRM Heartbeater thread");
}
public void run() {
while (true) { // 心跳线程死循环的跑
AllocateResponse response = null;
// synchronization ensures we don't send heartbeats after unregistering
synchronized (unregisterHeartbeatLock) {
if (!keepRunning) {
return;
}
try {
// 重点:心跳线程其实就是周期性的调用 allocate() 方法
response = client.allocate(progress);
} catch (ApplicationAttemptNotFoundException e) {
handler.onShutdownRequest();
LOG.info("Shutdown requested. Stopping callback.");
return;
} catch (Throwable ex) {
LOG.error("Exception on heartbeat", ex);
savedException = ex;
// interrupt handler thread in case it waiting on the queue
handlerThread.interrupt();
return;
}
if (response != null) {
while (true) {
try {
responseQueue.put(response);
break;
} catch (InterruptedException ex) {
LOG.debug("Interrupted while waiting to put on response queue", ex);
}
}
}
}
try {
Thread.sleep(heartbeatIntervalMs.get());
} catch (InterruptedException ex) {
LOG.debug("Heartbeater interrupted", ex);
}
}
}
}
至此,AM 已经完成向 RM 的注册及周期性心跳上报的过程,其中心跳上报是通过周期性地调用 ApplicationMasterProtocol#allocate() 方法来实现的。AM 心跳开始后,便会定期的向 RM 申请资源,以在对应的 NodeManager 上启动 Container 进程,在下一部分中会详细介绍。
3.2 ApplicationMaster 资源申请与分配流程
ApplicationMaster 资源申请与分配的对象都是针对 Container,下面也是以 Container 的申请与分配作为介绍内容。
(1)Container 分配与申请流程

Container 分配与申请流程
如上图,应用程序的 ApplicationMaster 在 NM 上成功启动并向 RM 注册后,向 RM 请求资源(Container)到获取到资源的整个过程,分为两个阶段:
阶段一:ApplicationMaster 汇报资源资源并领取已经分配到的资源;
- ApplicationMaster 通过 RPC 函数 ApplicationMasterProtocol#allocate 向 RM 汇报资源需求(由于是周期性调用,也叫“心跳”),包括包括新的资源需求描述、待释放的 Container 列表、请求加入黑名单的节点列表、请求移除黑名单的节点列表等;
- RM 中的 ApplicationMasterService 负责处理来自 ApplicationMaster 的请求,一旦受到请求,会向 RMAppAttemptImpl 发送一个 RMAppAttemptEventType.STATUS_UPDATE 类型事件,而 RMAppAttempImpl 收到该事件后,将更新应用程序执行进度和 AMLivelinessMonitor 中记录的应用程序最近更新事件。
- ApplicationMasterService 调用 ResourceScheduler#allocate 函数,将 ApplicationMaster 资源需求汇报给 ResourceScheduler。
- ResourceScheduler 首先读取待释放 Contianer 列表,依次向对应的 RMContainerImpl 发送 RMContainerEventType.RELEASED 类型事件,以杀死正在运行的 Container,然后将新的资源需求更新到对应数据中,并返回已经为该应用程序分配的资源。
阶段二:NM 向 RM 汇报各个 Container 运行状态,如果 RM 发现它上面又空闲的资源,则进行一次分配,并将分配的资源保存到 RM 数据结构中,等待下次 ApplicationMaster 发送心跳时获取。
- NM 通过 RPC 函数 ResourceTracker#nodeHeartbeat 向 RM 汇报各个 Container 运行状态。
- RM 中的 ResourceTrackerService 负责处理来自 NM 的请求,一旦收到请求,会向 RMNodeImpl 发送一个 RMNodeEventType.STATUS_UPDATE 事件,而 RMNodeImpl 收到事件后,将更新各个 Container 运行状态,并进一步向 ResourceScheduler 发送一个 SchedulerEventType.NODE_UPDATE 事件。
- ResourceScheduler 收到事件后,如果该节点又可分配的空闲资源,则会将这些资源分配给各个应用程序,而分配后的资源仅是记录到对应数据结构中,等待 ApplicationMaster 下次通过心跳机制来领取。
(2)源码分析
客户端调用 AMRMClientAsyncImpl#allocate() 方法会通过 RPC 函数向 RM 汇报资源需求,其通信接口是由 ApplicationMasterProtocol 协议来实现,来看看该协议是如何为客户端申请资源。
//位置:rg/apache/hadoop/yarn/server/resourcemanager/ApplicationMasterService.java
@Override
public AllocateResponse allocate(AllocateRequest request)
throws YarnException, IOException {
synchronized (lock) {
AllocateResponse lastResponse = lock.getAllocateResponse();
// 发送 STATUS_UPDATE 更新 AppAttempt 状态
this.rmContext.getDispatcher().getEventHandler().handle(
new RMAppAttemptStatusupdateEvent(appAttemptId, request
.getProgress()));
// 检查队列中的 memory 和 vcore 是否足够
try {
RMServerUtils.normalizeAndValidateRequests(ask,
rScheduler.getMaximumResourceCapability(), app.getQueue(),
rScheduler);
} catch (InvalidResourceRequestException e) {
LOG.warn("Invalid resource ask by application " + appAttemptId, e);
throw e;
}
// 重点:调用调度器的 allocate() 方法向 RM 上报资源需求
Allocation allocation =
this.rScheduler.allocate(appAttemptId, ask, release,
blacklistAdditions, blacklistRemovals);
// 更新请求的 response 和 AMRMToken 的状态,省略具体流程
return allocateResponse;
}
}
ApplicationMasterService#allocate() 方法会调用 YarnScheduler 的 allocate() 分配方法,由于采用是 FairScheduler 调度器,我们来分析下 FairScheduler#allocate() 方法。分配过程的核心在 pullNewlyAllocatedContainersAndNMTokens() 方法,该方法的核心是从 newlyAllocatedContainers 这个 List 数据结构中取 Container,那取到的 Container 是从哪儿来的呢?其实就是 NoddeManager 心跳发生时进行资源分配逻辑分配出来的 Container,是保存在 RM 的内存数据结构 newlyAllocatedContainers 中,AM 则直接从该数据结构中取对应的 Container。
//位置:org/apache/hadoop/yarn/server/resourcemanager/scheduler/fair/FairScheduler.java
@Override
public Allocation allocate(ApplicationAttemptId appAttemptId,
List<ResourceRequest> ask, List<ContainerId> release,
List<String> blacklistAdditions, List<String> blacklistRemovals) {
// 规整化资源请求
SchedulerUtils.normalizeRequests(ask, DOMINANT_RESOURCE_CALCULATOR,
getClusterResource(), minimumAllocation, getMaximumResourceCapability(),
incrAllocation);
// 记录 Container 分配的开始时间
application.recordContainerRequestTime(getClock().getTime());
// Release containers
releaseContainers(release, application);
synchronized (application) {
if (!ask.isEmpty()) {
if (LOG.isDebugEnabled()) {
LOG.debug("allocate: pre-update" +
" applicationAttemptId=" + appAttemptId +
" application=" + application.getApplicationId());
}
application.showRequests();
// Update application requests
application.updateResourceRequests(ask);
application.showRequests();
}
... // 省略
// 重点:对 Container 进行鉴权,并拿到之前为 AppAttempt 分配的 Container 资源
// 该资源保存在 RM 内存数据结构中,由 assignContainer() 方法分配出来的,具体分配逻辑可以看 Yarn 的资源分配逻辑
ContainersAndNMTokensAllocation allocation =
application.pullNewlyAllocatedContainersAndNMTokens();
// Record container allocation time
if (!(allocation.getContainerList().isEmpty())) {
application.recordContainerAllocationTime(getClock().getTime());
}
// 将分配的 Container 资源返回给客户端(AM)
return new Allocation(allocation.getContainerList(),
application.getHeadroom(), preemptionContainerIds, null, null,
allocation.getNMTokenList());
}
}
至此,AM 周期性心跳进行资源申请的逻辑在这里已经拿到了 Container,那拿到 Container 后又怎样启动呢,不同任务类型的启动方式不太一样,这里就不做详细介绍。
【参考资料】
- 董西成,《Hadoop技术内幕:深入分析YARN架构设计与实现原理》
- https://blog.csdn.net/weixin_42642341/article/details/81636135
- https://blog.csdn.net/Androidlushangderen/article/details/48128955
- https://blog.csdn.net/weixin_42642341/article/details/82354964
转载:https://www.cnblogs.com/lemonu/p/13566381.html


浙公网安备 33010602011771号