|NO.Z.00088|——————————|BigDataEnd|——|Hadoop&Spark.V04|——|Spark.v04|Spark 原理 源码|作业执行原理&Stage划分&dagScheduler.submit 发送消息|
一、stage划分
### --- dagScheduler.submit 发送消息
~~~ # 源码提取说明:DAGScheduler.scala
~~~ # 676行~703行
def submitJob[T, U](
rdd: RDD[T],
func: (TaskContext, Iterator[T]) => U,
partitions: Seq[Int],
callSite: CallSite,
resultHandler: (Int, U) => Unit,
properties: Properties): JobWaiter[U] = {
// Check to make sure we are not launching a task on a partition that does not exist.
// 获取当前Job的最大分区数
val maxPartitions = rdd.partitions.length
// 检查不存在的分区,如果有就抛出异常
partitions.find(p => p >= maxPartitions || p < 0).foreach { p =>
throw new IllegalArgumentException(
"Attempting to access a non-existent partition: " + p + ". " +
"Total number of partitions: " + maxPartitions)
}
// 生成下一个Job的jobId
val jobId = nextJobId.getAndIncrement()
/**
* 如果Job的分区数量等于0,则创建一个totalTasks属性为0的JobWaiter并返回。
* 根据JobWaiter的实现,totalTasks属性为0的JobWaiter的jobPromise将被设置为Success。
*/
if (partitions.size == 0) {
// Return immediately if the job is running 0 tasks
return new JobWaiter[U](this, jobId, 0, resultHandler)
}
assert(partitions.size > 0)
// 分区数量大于0
val func2 = func.asInstanceOf[(TaskContext, Iterator[_]) => _]
// 创建JobWaiter
val waiter = new JobWaiter(this, jobId, partitions.size, resultHandler)
/**
* 将JobWaiter包装到JobSubmitted消息中,投递给DAGSchedulerEventProcessLoop,
* 这个消息最终会被DAGScheduler的handleJobSubmitted()方法处理。
*/
eventProcessLoop.post(JobSubmitted(
jobId, rdd, func2, partitions.toArray, callSite, waiter,
SerializationUtils.clone(properties)))
// 返回JobWaiter
waiter
}
Walter Savage Landor:strove with none,for none was worth my strife.Nature I loved and, next to Nature, Art:I warm'd both hands before the fire of life.It sinks, and I am ready to depart
——W.S.Landor
浙公网安备 33010602011771号