Flink源码阅读(一)——Per-job之Yarn的作业调度(一)

1、前言

  Flink作业提交到Yarn上之后,后续的AM的生成、Job的处理过程和Flink基本没什么关系了,但是为大致了解Flink on yarn的Per-Job模式的整体过程,这里还是将这系列博客归到Flink源码阅读系列了,本系列博客计划三篇。

  本文着重分析submitApplication之后,Yarn的ResourceManager为任务的ApplicationMater分配container的过程。

  说明:文中源码是从Flink 1.9中跳转过去,主要涉及hadoop-yarn-server-resourcemanager-2.4.1.jar、flink-shaded-hadoop-2-2.4.1-7.0.jar。

  博主水平有限,欢迎大伙留言交流。

涉及的重要概念【1】:

  1)RMApp:每个application是一个RMApp对象,其包含了application的各种信息,实现类为RMAppImpl;

  2)RMAppAttempt:RMApp可以有多个app attempt,即对应着多个RMAppAttempt对象,也就是任务状态的变化的过程。具体对应着那个,取决于前面的RMAppAttempt是否执行成功,如果不成功,会启动另外一个,直到运行成功;

  3)Dispatcher:中央事件调度器,各个状态机的事件调度器会在中央事件调度器中注册。该调度器维护了一个事件队列,其会不断扫描整个队列,取出事件并检查事件类型,然后交给相应的事件调度器处理。其实现类为AsyncDispatcher和MultiThreadedDispatcher,后者是创建一个list用于放AsyncDispatcher

2、事件的提交到调度  

  1、在Flink on yarn的Per-job模式源码解析一文中提到,client提交的报文被封装成request后被ClientRMService.submitApplication()方法处理。其过程如下:

  1)在该方法中会先检查与Yarn RM相互独立的配置,比如applicationId、提交到的资源对列名、任务名等;

  2)调用RMAppManager.submitApplication()提交任务。

  代码如下:

 1 public SubmitApplicationResponse submitApplication(
 2       SubmitApplicationRequest request) throws YarnException {
 3     ApplicationSubmissionContext submissionContext = request
 4         .getApplicationSubmissionContext();
 5     ApplicationId applicationId = submissionContext.getApplicationId();
 6 
 7     // ApplicationSubmissionContext needs to be validated for safety - only
 8     // those fields that are independent of the RM's configuration will be
 9     // checked here, those that are dependent on RM configuration are validated
10     // in RMAppManager.
11 
12     String user = null;
13     try {
14       // Safety
15       user = UserGroupInformation.getCurrentUser().getShortUserName();
16     } catch (IOException ie) {
17       LOG.warn("Unable to get the current user.", ie);
18       RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
19           ie.getMessage(), "ClientRMService",
20           "Exception in submitting application", applicationId);
21       throw RPCUtil.getRemoteException(ie);
22     }
23     //开始检查applicationId是否已存在,检查对列是否设置等
24     // Check whether app has already been put into rmContext,
25     // If it is, simply return the response
26     if (rmContext.getRMApps().get(applicationId) != null) {
27       LOG.info("This is an earlier submitted application: " + applicationId);
28       return SubmitApplicationResponse.newInstance();
29     }
30 
31     if (submissionContext.getQueue() == null) {
32       submissionContext.setQueue(YarnConfiguration.DEFAULT_QUEUE_NAME);
33     }
34     if (submissionContext.getApplicationName() == null) {
35       submissionContext.setApplicationName(
36           YarnConfiguration.DEFAULT_APPLICATION_NAME);
37     }
38     if (submissionContext.getApplicationType() == null) {
39       submissionContext
40         .setApplicationType(YarnConfiguration.DEFAULT_APPLICATION_TYPE);
41     } else {
42       if (submissionContext.getApplicationType().length() > YarnConfiguration.APPLICATION_TYPE_LENGTH) {
43         submissionContext.setApplicationType(submissionContext
44           .getApplicationType().substring(0,
45             YarnConfiguration.APPLICATION_TYPE_LENGTH));
46       }
47     }
48 
49     try {
50     //提交application
51       // call RMAppManager to submit application directly
52       rmAppManager.submitApplication(submissionContext,
53           System.currentTimeMillis(), user);
54 
55       LOG.info("Application with id " + applicationId.getId() + 
56           " submitted by user " + user);
57       RMAuditLogger.logSuccess(user, AuditConstants.SUBMIT_APP_REQUEST,
58           "ClientRMService", applicationId);
59     } catch (YarnException e) {
60       LOG.info("Exception in submitting application with id " +
61           applicationId.getId(), e);
62       RMAuditLogger.logFailure(user, AuditConstants.SUBMIT_APP_REQUEST,
63           e.getMessage(), "ClientRMService",
64           "Exception in submitting application", applicationId);
65       throw e;
66     }
67 
68     SubmitApplicationResponse response = recordFactory
69         .newRecordInstance(SubmitApplicationResponse.class);
70     return response;
71   }
View Code

   2、在RMAppManager.submitApplication()中主要干两件事:一是启动状态机;二是根据是否开启安全认证(Yarn的配置)走不同的分支去调度,代码如下:

 1 protected void submitApplication(
 2       ApplicationSubmissionContext submissionContext, long submitTime,
 3       String user) throws YarnException {
 4     ApplicationId applicationId = submissionContext.getApplicationId();
 5 
 6     //创建application,在构造函数中会启动状态机,此外,会根据设置决定是否保存记录
 7     RMAppImpl application =
 8         createAndPopulateNewRMApp(submissionContext, submitTime, user);
 9     ApplicationId appId = submissionContext.getApplicationId();
10     //开启安全认证
11     if (UserGroupInformation.isSecurityEnabled()) {
12       Credentials credentials = null;
13       try {
14         credentials = parseCredentials(submissionContext);
15         this.rmContext.getDelegationTokenRenewer().addApplicationAsync(appId,
16           credentials, submissionContext.getCancelTokensWhenComplete());
17       } catch (Exception e) {
18         LOG.warn("Unable to parse credentials.", e);
19         // Sending APP_REJECTED is fine, since we assume that the
20         // RMApp is in NEW state and thus we haven't yet informed the
21         // scheduler about the existence of the application
22         assert application.getState() == RMAppState.NEW;
23         this.rmContext.getDispatcher().getEventHandler()
24           .handle(new RMAppRejectedEvent(applicationId, e.getMessage()));
25         throw RPCUtil.getRemoteException(e);
26       }
27     } else {
28       //Dispatcher不会立即启动,事件会被塞进对列
29       //启动RMApp的状态机,等待dispathcer调用对应的事件调度器
30       // Dispatcher is not yet started at this time, so these START events
31       // enqueued should be guaranteed to be first processed when dispatcher
32       // gets started.
33       this.rmContext.getDispatcher().getEventHandler()
34         .handle(new RMAppEvent(applicationId, RMAppEventType.START));
35     }
36   }
View Code

  3、Resourcemanager中有AsyncDispatcher来调度事件,各种事件对应的调度器可以看看ResourceManager类。

  这里仅给出其部分结构图,在这里我们主要关注ApplicationEventDispatcher。

 1 public static final class ApplicationEventDispatcher implements
 2       EventHandler<RMAppEvent> {
 3 
 4     private final RMContext rmContext;
 5 
 6     public ApplicationEventDispatcher(RMContext rmContext) {
 7       this.rmContext = rmContext;
 8     }
 9 
10     @Override
11     public void handle(RMAppEvent event) {
12       ApplicationId appID = event.getApplicationId();
13       RMApp rmApp = this.rmContext.getRMApps().get(appID);
14       if (rmApp != null) {
15         try {
16         //调用RMApp的状态机
17           rmApp.handle(event);
18         } catch (Throwable t) {
19           LOG.error("Error in handling event type " + event.getType()
20               + " for application " + appID, t);
21         }
22       }
23     }
24   }

  这里的handle(...)方法比较简单,就不贴源码分析了。下面我们着重分析application状态的变化过程。

3、Application状态的变化

  1、从NEW->NEW_SAVING

   1)在RMAppImpl.java类中

 1 private static final StateMachineFactory<RMAppImpl,
 2                                            RMAppState,
 3                                            RMAppEventType,
 4                                            RMAppEvent> stateMachineFactory
 5                                = new StateMachineFactory<RMAppImpl,
 6                                            RMAppState,
 7                                            RMAppEventType,
 8                                            RMAppEvent>(RMAppState.NEW)
 9 
10      //这里仅给出部分
11      // Transitions from NEW state
12     .addTransition(RMAppState.NEW, RMAppState.NEW,
13         RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
14     //把RMApp的状态从NEW变成NEW_SAVING
15     .addTransition(RMAppState.NEW, RMAppState.NEW_SAVING,
16         RMAppEventType.START, new RMAppNewlySavingTransition())
17     .addTransition(RMAppState.NEW, EnumSet.of(RMAppState.SUBMITTED,
18             RMAppState.ACCEPTED, RMAppState.FINISHED, RMAppState.FAILED,
19             RMAppState.KILLED, RMAppState.FINAL_SAVING),
20         RMAppEventType.RECOVER, new RMAppRecoveredTransition())
21         .........

   2)我们沿着new RMAppNewlySavingTransition()分析,其源码如下:

 1  private static final class RMAppNewlySavingTransition extends RMAppTransition {
 2     @Override
 3     public void transition(RMAppImpl app, RMAppEvent event) {
 4 
 5       // If recovery is enabled then store the application information in a
 6       // non-blocking call so make sure that RM has stored the information
 7       // needed to restart the AM after RM restart without further client
 8       // communication
 9       LOG.info("Storing application with id " + app.applicationId);
10       app.rmContext.getStateStore().storeNewApplication(app);
11     }
12   }

  3)从其注释,我们可以清晰的得到的,这里主要是保存application的信息,其主要调用的是RMStateStore.storeNewApplication(),代码如下:

 1   /**
 2    * Non-Blocking API
 3    * ResourceManager services use this to store the application's state
 4    * This does not block the dispatcher threads
 5    * RMAppStoredEvent will be sent on completion to notify the RMApp
 6    */
 7   @SuppressWarnings("unchecked")
 8   public synchronized void storeNewApplication(RMApp app) {
 9     ApplicationSubmissionContext context = app
10                                             .getApplicationSubmissionContext();
11     assert context instanceof ApplicationSubmissionContextPBImpl;
12     ApplicationState appState =
13         new ApplicationState(app.getSubmitTime(), app.getStartTime(), context,
14           app.getUser());
15     dispatcher.getEventHandler().handle(new RMStateStoreAppEvent(appState));     //下面重点说明
16   }

  4)从上面代码的15行(标有重点说明),我们可以看到storeAppEvent的动作又交给Dispatcher去调度了,在类RMStateStore中的下面部分(266~277)

 1      //定义
 2    AsyncDispatcher dispatcher;
 3 
 4   @Override
 5   protected void serviceInit(Configuration conf) throws Exception{
 6     // create async handler
 7     dispatcher = new AsyncDispatcher();
 8     dispatcher.init(conf);
 9     //ForwardingEventHandler是具体实施动作的类
10     dispatcher.register(RMStateStoreEventType.class, 
11                         new ForwardingEventHandler());
12     dispatcher.setDrainEventsOnStop();
13     initInternal(conf);
14   }

  2、从NEW_SAVING ->SUBMIT

  1)进去内部类ForwardingEventHandler,我们发现其主要是重写了handle()方法,调用RMStateStore.handleStoreEvent(),跟进这个方法,代码如下:

 1 //代码太多没贴全,详细可以看看RMStateStore类的598~699行
 2 // Dispatcher related code
 3   protected void handleStoreEvent(RMStateStoreEvent event) {
 4     if (event.getType().equals(RMStateStoreEventType.STORE_APP)
 5         || event.getType().equals(RMStateStoreEventType.UPDATE_APP)) {
 6       ApplicationState appState = null;
 7       if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
 8         appState = ((RMStateStoreAppEvent) event).getAppState();
 9       } else {
10         assert event.getType().equals(RMStateStoreEventType.UPDATE_APP);
11         appState = ((RMStateUpdateAppEvent) event).getAppState();
12       }
13 
14       Exception storedException = null;
15       ApplicationStateDataPBImpl appStateData =
16           (ApplicationStateDataPBImpl) ApplicationStateDataPBImpl
17             .newApplicationStateData(appState.getSubmitTime(),
18               appState.getStartTime(), appState.getUser(),
19               appState.getApplicationSubmissionContext(), appState.getState(),
20               appState.getDiagnostics(), appState.getFinishTime());
21 
22       ApplicationId appId =
23           appState.getApplicationSubmissionContext().getApplicationId();
24 
25       LOG.info("Storing info for app: " + appId);
26       try {
27         if (event.getType().equals(RMStateStoreEventType.STORE_APP)) {
28         //状态的保存可以保存在ZooKeeper、内存和文件系统中(整个过程未细看)
29           storeApplicationStateInternal(appId, appStateData);
30          //将event的状态修改为APP_NEW_SAVED,然后进入RMAppImpl类的状态机进行下一步转换
31           notifyDoneStoringApplication(appId, storedException);
32         } else {
33           assert event.getType().equals(RMStateStoreEventType.UPDATE_APP);
34           updateApplicationStateInternal(appId, appStateData);
35           notifyDoneUpdatingApplication(appId, storedException);
36         }
37       } catch (Exception e) {
38         LOG.error("Error storing/updating app: " + appId, e);
39         notifyStoreOperationFailed(e);
40       }
41     } else if{......}
42     }

 2)这里给出RMAppImpl类的关键代码(152~156),如下:

1 // Transitions from NEW_SAVING state
2     .addTransition(RMAppState.NEW_SAVING, RMAppState.NEW_SAVING,
3         RMAppEventType.NODE_UPDATE, new RMAppNodeUpdateTransition())
4     .addTransition(RMAppState.NEW_SAVING, RMAppState.SUBMITTED,
5         RMAppEventType.APP_NEW_SAVED, new AddApplicationToSchedulerTransition())  //关键

3、SUBMIT - >ACCEPTED

1)沿着代码,我们看一下AddApplicationToSchedulerTransition类

 1 private static final class AddApplicationToSchedulerTransition extends
 2       RMAppTransition {
 3     @Override
 4     public void transition(RMAppImpl app, RMAppEvent event) {
 5       if (event instanceof RMAppNewSavedEvent) {
 6         RMAppNewSavedEvent storeEvent = (RMAppNewSavedEvent) event;
 7         // For HA this exception needs to be handled by giving up
 8         // master status if we got fenced
 9         if (((RMAppNewSavedEvent) event).getStoredException() != null) {
10           LOG.error(
11             "Failed to store application: " + storeEvent.getApplicationId(),
12             storeEvent.getStoredException());
13           ExitUtil.terminate(1, storeEvent.getStoredException());
14         }
15       }
16       //这里会调用yarn的scheduler去处理,这里我们分析目前社区推荐的CapacityScheduler
17       app.handler.handle(new AppAddedSchedulerEvent(app.applicationId,
18         app.submissionContext.getQueue(), app.user));
19     }
20   }

2)在CapacityScheduler中的882~887行,

1 case APP_ADDED:
2     {
3       AppAddedSchedulerEvent appAddedEvent = (AppAddedSchedulerEvent) event;
4       //这里主要是对queue做一些检查,然后将Event提交到queue,
5       //事件的状态又为APP_ACCEPTED,从而重新触发RMAppImpl中的状态机
6       //感兴趣的可以去看一下511~546行
7       addApplication(appAddedEvent.getApplicationId(),
8         appAddedEvent.getQueue(), appAddedEvent.getUser());
9     }

3)重新回到RMAppImpl类中

1     .addTransition(RMAppState.SUBMITTED, RMAppState.ACCEPTED,
2         RMAppEventType.APP_ACCEPTED, new StartAppAttemptTransition())

 4、RMAppAttempt状态变化

1、NEW -> SUBMIT

1)我们这里跟进去StartAppAttemptTransition,看看其干了什么,经过跟踪代码,代码如下(657~679):

1 private void
2       createAndStartNewAttempt(boolean transferStateFromPreviousAttempt) {
3       //这里会记录同一个application的重试次数
4     createNewAttempt();
5     //在新建RMAppStartAttemptEvent对象时会触发RMAppAttemptEventType.START事件,
6     handler.handle(new RMAppStartAttemptEvent(currentAttempt.getAppAttemptId(),
7       transferStateFromPreviousAttempt));
8   }

2)在类RMAppAttemptImpl的状态机中,我们可以看到RMAppAttemptState的状态从NEW变为SUBMITTED

1        // Transitions from NEW State
2       .addTransition(RMAppAttemptState.NEW, RMAppAttemptState.SUBMITTED,
3           RMAppAttemptEventType.START, new AttemptStartedTransition())

2、SUBMITTED -> SCHEDULED

 1)和上面的流程一样,我们跟进AttemptStartedTransition类,发现其主要是重写了transtition方法,具体代码如下:

 1 private static final class AttemptStartedTransition extends BaseTransition {
 2     @Override
 3     public void transition(RMAppAttemptImpl appAttempt,
 4         RMAppAttemptEvent event) {
 5 
 6         boolean transferStateFromPreviousAttempt = false;
 7       if (event instanceof RMAppStartAttemptEvent) {
 8         transferStateFromPreviousAttempt =
 9             ((RMAppStartAttemptEvent) event)
10               .getTransferStateFromPreviousAttempt();
11       }
12       appAttempt.startTime = System.currentTimeMillis();
13 
14       // Register with the ApplicationMasterService
15       appAttempt.masterService
16           .registerAppAttempt(appAttempt.applicationAttemptId);
17 
18       if (UserGroupInformation.isSecurityEnabled()) {
19         appAttempt.clientTokenMasterKey =
20             appAttempt.rmContext.getClientToAMTokenSecretManager()
21               .createMasterKey(appAttempt.applicationAttemptId);
22       }
23 
24       // create AMRMToken
25       AMRMTokenIdentifier id =
26           new AMRMTokenIdentifier(appAttempt.applicationAttemptId);
27       appAttempt.amrmToken =
28           new Token<AMRMTokenIdentifier>(id,
29             appAttempt.rmContext.getAMRMTokenSecretManager());
30 
31       // Add the applicationAttempt to the scheduler and inform the scheduler
32       // whether to transfer the state from previous attempt.
33       //触发scheduler的APP_ATTEMPT_ADDED事件,这里分析CapacityScheduler
34       appAttempt.eventHandler.handle(new AppAttemptAddedSchedulerEvent(
35         appAttempt.applicationAttemptId, transferStateFromPreviousAttempt));
36     }
37   }

 2)在CapacityScheduler中,我们分析其分支代码如下:

 1    case APP_ATTEMPT_ADDED:
 2     {
 3       AppAttemptAddedSchedulerEvent appAttemptAddedEvent =
 4           (AppAttemptAddedSchedulerEvent) event;
 5       addApplicationAttempt(appAttemptAddedEvent.getApplicationAttemptId(),
 6         appAttemptAddedEvent.getTransferStateFromPreviousAttempt());
 7     }
 8     break;
 9 
10 //--------------------------------------------------
11     private synchronized void addApplicationAttempt(
12       ApplicationAttemptId applicationAttemptId,
13       boolean transferStateFromPreviousAttempt) {
14     SchedulerApplication application =
15         applications.get(applicationAttemptId.getApplicationId());
16     CSQueue queue = (CSQueue) application.getQueue();
17 
18     FiCaSchedulerApp attempt =
19         new FiCaSchedulerApp(applicationAttemptId, application.getUser(),
20           queue, queue.getActiveUsersManager(), rmContext);
21     if (transferStateFromPreviousAttempt) {
22       attempt.transferStateFromPreviousAttempt(application
23         .getCurrentAppAttempt());
24     }
25     application.setCurrentAppAttempt(attempt);
26 
27     queue.submitApplicationAttempt(attempt, application.getUser());
28     LOG.info("Added Application Attempt " + applicationAttemptId
29         + " to scheduler from user " + application.getUser() + " in queue "
30         + queue.getQueueName());
31     //这里重新触发RMAppAttemptImpl的状态机,RMAppAttemptState从SUBMITTED变为SCHEDULED
32     rmContext.getDispatcher().getEventHandler() .handle(
33         new RMAppAttemptEvent(applicationAttemptId,
34           RMAppAttemptEventType.ATTEMPT_ADDED));
35   }

 3)重新回到RMAppAttemptImpl类中,分析状态机是如何处理事件类型为ATTEMPT_ADDED的,代码如下:

 1 // Transitions from SUBMITTED state
 2       .addTransition(RMAppAttemptState.SUBMITTED, 
 3           EnumSet.of(RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING,
 4                      RMAppAttemptState.SCHEDULED),
 5           RMAppAttemptEventType.ATTEMPT_ADDED,
 6           new ScheduleTransition())
 7     //---------------------------------------------------
 8 
 9     private static final class ScheduleTransition
10       implements
11       MultipleArcTransition<RMAppAttemptImpl, RMAppAttemptEvent, RMAppAttemptState> {
12     @Override
13     public RMAppAttemptState transition(RMAppAttemptImpl appAttempt,
14         RMAppAttemptEvent event) {
15         //没有对应的ApplicationMater
16       if (!appAttempt.submissionContext.getUnmanagedAM()) {
17         // Request a container for the AM.
18         //申请一个container,容器大小由我们设定
19         ResourceRequest request =
20             BuilderUtils.newResourceRequest(
21                 AM_CONTAINER_PRIORITY, ResourceRequest.ANY, appAttempt
22                     .getSubmissionContext().getResource(), 1);
23 
24         // SchedulerUtils.validateResourceRequests is not necessary because
25         // AM resource has been checked when submission
26         //根据心跳信息找到有资源的节点,分配一个container作为AM,后面代码的逻辑比较清楚了
27         Allocation amContainerAllocation = appAttempt.scheduler.allocate(
28             appAttempt.applicationAttemptId,
29             Collections.singletonList(request), EMPTY_CONTAINER_RELEASE_LIST, null, null);
30         if (amContainerAllocation != null
31             && amContainerAllocation.getContainers() != null) {
32           assert (amContainerAllocation.getContainers().size() == 0);
33         }
34         return RMAppAttemptState.SCHEDULED;
35       } else {
36         // save state and then go to LAUNCHED state
37         appAttempt.storeAttempt();
38         return RMAppAttemptState.LAUNCHED_UNMANAGED_SAVING;
39       }
40     }
41   }

Ref:

【1】https://www.iteye.com/blog/humingminghz-2326608 

 

posted @ 2020-05-11 23:58  王大咩的图书馆  阅读(1314)  评论(0编辑  收藏  举报