Flink JAR包上传和运行逻辑

https://blog.csdn.net/xianzhen376/article/details/86774348

文章目录
说明
启动ResetServer
注册Handler
Upload JAR
Run Jar
生成JobGraph的过程
调用用户程序main方法
执行用户程序main方法
执行execute （和接触过一个概念很类似-打桩测试）
提交JobGraph
ExectionGraph Deploy的过程
说明
目标：走读Flink Clint中Upload jar、Run jar相关代码
源码版本：1.6.1
部属模式：Standalone
相关知识点：Netty、 CompletedFuture
启动ResetServer
RestServerEndpoint.start

注册Handler
代码From DispatcherRestEndpoint.java

protected List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> initializeHandlers(CompletableFuture<String> restAddressFuture) {
List<Tuple2<RestHandlerSpecification, ChannelInboundHandler>> handlers = super.initializeHandlers(restAddressFuture);
...

JobSubmitHandler jobSubmitHandler = new JobSubmitHandler(
restAddressFuture,
leaderRetriever,
timeout,
responseHeaders,
executor,
clusterConfiguration);

if (clusterConfiguration.getBoolean(WebOptions.SUBMIT_ENABLE)) {
try {
// 此处注册了JAR Upload和Run的处理方法
webSubmissionExtension = WebMonitorUtils.loadWebSubmissionExtension(
leaderRetriever,
restAddressFuture,
timeout,
responseHeaders,
uploadDir,
executor,
clusterConfiguration);

// register extension handlers
handlers.addAll(webSubmissionExtension.getHandlers());
} catch (FlinkException e) {
...
}
} else {
log.info("Web-based job submission is not enabled.");
}

...

return handlers;
}
在WebSubmissionExtension中，可以看到定义了Upload、Run、List、Delete、Plan的Handler

Upload JAR
处理代码在JarUploadHandler的handleRequest方法中。

Jar包存放路径：

jarDir.resolve(UUID.randomUUID() + "_" + fileUpload.getFileName());

方法本身逻辑简单，比较隐蔽的是jarDir的值。通过倒推寻找该值的赋值过程。

JarUploadHandler 构造时赋值属性jarDir；
JarUploadHandler由WebSubmissionExtension通过WebMonitorUtils.loadWebSubmissionExtension构造，jarDir源自父类RestServerEndpoint中的变量uploadDir；
RestServerEndpoint中uploadDir通过configuration.getUploadDir()初始化
在RestServerEndpointConfiguration中找到了源头：
final Path uploadDir = Paths.get(
config.getString(WebOptions.UPLOAD_DIR, config.getString(WebOptions.TMP_DIR)),
"flink-web-upload");
一般情况下，大家都不会改写配置项WebOption.UPLOAD_DIR（对应配置项“web.upload.dir”），所以JAR包存放到了"$WebOptions.TMP_DIR/flink-web-upload"

WebOptions.TMP_DIR的赋值比较隐蔽，只从配置文件看，是在/tmp目录。但是在ClusterEntrypoint的generateClusterConfiguration中，其实对该值进行了改写：

final String webTmpDir = configuration.getString(WebOptions.TMP_DIR);
final File uniqueWebTmpDir = new File(webTmpDir, "flink-web-" + UUID.randomUUID());

resultConfiguration.setString(WebOptions.TMP_DIR, uniqueWebTmpDir.getAbsolutePath());

最终的效果JAR包存放目录是"/tmp/flink-web-UUID/flink-web-upload"

存放在tmp目录里面是有风险的，过期后会被删除。

Run Jar
同上，重点关注JarRunHandler的handleRequest

@Override
protected CompletableFuture<JarRunResponseBody> handleRequest(
@Nonnull final HandlerRequest<JarRunRequestBody, JarRunMessageParameters> request,
@Nonnull final DispatcherGateway gateway) throws RestHandlerException {
...

# 产生JobGraph
final CompletableFuture<JobGraph> jobGraphFuture = getJobGraphAsync(
jarFile,
entryClass,
programArgs,
savepointRestoreSettings,
parallelism);

CompletableFuture<Integer> blobServerPortFuture = gateway.getBlobServerPort(timeout);

# Jar上传JobGraph，UserJar和UserArtifact
CompletableFuture<JobGraph> jarUploadFuture = jobGraphFuture.thenCombine(blobServerPortFuture, (jobGraph, blobServerPort) -> {
final InetSocketAddress address = new InetSocketAddress(gateway.getHostname(), blobServerPort);
try {
ClientUtils.extractAndUploadJobGraphFiles(jobGraph, () -> new BlobClient(address, configuration));
} catch (FlinkException e) {
throw new CompletionException(e);
}

return jobGraph;
});

CompletableFuture<Acknowledge> jobSubmissionFuture = jarUploadFuture.thenCompose(jobGraph -> {
// we have to enable queued scheduling because slots will be allocated lazily
jobGraph.setAllowQueuedScheduling(true);
# 提交Job
return gateway.submitJob(jobGraph, timeout);
});

return jobSubmissionFuture
.thenCombine(jarUploadFuture, (ack, jobGraph) -> new JarRunResponseBody(jobGraph.getJobID()))
.exceptionally(throwable -> {
throw new CompletionException(new RestHandlerException(
throwable.getMessage(),
HttpResponseStatus.INTERNAL_SERVER_ERROR,
throwable));
});
}

生成JobGraph的过程
/* 在JarRunHandler的getJobGraphAsync中构造了PackagedProgram */
final PackagedProgram packagedProgram = new PackagedProgram(
jarFile.toFile(),
entryClass,
programArgs.toArray(new String[programArgs.size()]));
jobGraph = PackagedProgramUtils.createJobGraph(packagedProgram, configuration, parallelism);

/* From PackagedProgramUtils.java */
public static JobGraph createJobGraph(
PackagedProgram packagedProgram,
Configuration configuration,
int defaultParallelism) throws ProgramInvocationException {
....

if (packagedProgram.isUsingProgramEntryPoint()) {
...
} else if (packagedProgram.isUsingInteractiveMode()) {
/* 一般提交的流程序会走这个分支，判断原则是用户程序的main Class是否isAssignableFrom ProgramDescription */
final OptimizerPlanEnvironment optimizerPlanEnvironment = new OptimizerPlanEnvironment(optimizer);

optimizerPlanEnvironment.setParallelism(defaultParallelism);

// 会触发main函数调用
flinkPlan = optimizerPlanEnvironment.getOptimizedPlan(packagedProgram);
} else {
throw new ProgramInvocationException("PackagedProgram does not have a valid invocation mode.");
}

if (flinkPlan instanceof StreamingPlan) {
// 获取JobGraph
jobGraph = ((StreamingPlan) flinkPlan).getJobGraph();
jobGraph.setSavepointRestoreSettings(packagedProgram.getSavepointSettings());
} else {
...
}

...

return jobGraph;
}

调用用户程序main方法
/* From OptimizerPlanEnvironment.java */
public FlinkPlan getOptimizedPlan(PackagedProgram prog) throws ProgramInvocationException {
...

/* 设置ContextEnviormentFacoty对应的env为OptimizerPlanEnvironment */
setAsContext();
try {
/* 调用用户程序main方法 */
prog.invokeInteractiveModeForExecution();
}
...
}
执行用户程序main方法
// 一个常见的main 结构
public static void main(String[] args) throws Exception {
/* 此处获取的是上一步setAsContext中设置的OptimizerPlanEnvironment */
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

...
/* 对应的是执行OptimizerPlanEnvironment的execute */
env.execute();
}
执行execute （和接触过一个概念很类似-打桩测试）
public JobExecutionResult execute(String jobName) throws Exception {
/* 反馈Compile后的FlinkPlan */
Plan plan = createProgramPlan(jobName);
this.optimizerPlan = compiler.compile(plan);

// execute后不要带其他的用户程序
// do not go on with anything now!
throw new ProgramAbortException();
}
提交JobGraph
OK，已经得到了JobGraph，再细看提交JobGraph的过程

/* From Dispatcher.java */
public CompletableFuture<Acknowledge> submitJob(JobGraph jobGraph, Time timeout) {

...

if (jobSchedulingStatus == RunningJobsRegistry.JobSchedulingStatus.DONE || jobManagerRunnerFutures.containsKey(jobId)) {
return FutureUtils.completedExceptionally(
new JobSubmissionException(jobId, String.format("Job has already been submitted and is in state %s.", jobSchedulingStatus)));
} else {
//重点关注persistAndRunJob
final CompletableFuture<Acknowledge> persistAndRunFuture = waitForTerminatingJobManager(jobId, jobGraph, this::persistAndRunJob)
.thenApply(ignored -> Acknowledge.get());

return persistAndRunFuture.exceptionally(
(Throwable throwable) -> {
final Throwable strippedThrowable = ExceptionUtils.stripCompletionException(throwable);
log.error("Failed to submit job {}.", jobId, strippedThrowable);
throw new CompletionException(
new JobSubmissionException(jobId, "Failed to submit job.", strippedThrowable));
});
}
}

省略一些方法间调用，调用顺序如下：

Dispatch.persistAndRunJob
Dispatch.runJob
Dispatch.createJobManagerRunner，创建JobMaster
JobMaster.createAndRestoreExecutionGraph
终于看到了ExecutionGraph
ExectionGraph Deploy的过程
方法间调用关系：

上接Dispatcher.createJobManagerRunner
Dispatcher.startJobManagerRunner
JobManagerRunner.start
StandaloneLeaderElectionService.start
JobManagerRunner.grantLeadership
JobManagerRunner.verifyJobSchedulingStatusAndStartJobManager
JobMaster.start
JobMaster.startJobExecution
JobMaster.resetAndScheduleExecutionGraph
JobMaster.scheduleExecutionGraph
ExecutionGraph.scheduleForExecution
ExecutionGraph.scheduleEager
Execution.deploy

posted on 2019-04-03 10:20 一天不进步，就是退步阅读(1810) 评论(0) 收藏举报