Spark 源码系列 - DAGScheduler 触发

结论

action类型的算子,最终由 SparkContext.runJob 触发 DAGScheduler.runJob

action算子最终触发sparkContext.runJob

saveAsTextFile action算子触发

/*
 底层调用 SparkHadoopWriter.write
 */
rdd.saveAsTextFile("./data/output")

SparkHadoopWriter -> write

object SparkHadoopWriter extends Logging {
  ...
  def write[K, V: ClassTag](
    val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: Iterator[(K, V)]) => {

foreach action算子触发

rdd.foreach(println)

RDD -> foreach

abstract class RDD[T: ClassTag](...) extends Serializable with Logging {
  def foreach(f: T => Unit): Unit = withScope {
    val cleanF = sc.clean(f)
    // sc 是 sparkContext
    sc.runJob(this, (iter: Iterator[T]) => iter.foreach(cleanF))
  }

sparkContext -> runJob

  def runJob[T, U: ClassTag](...)
    ...
    dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
    progressBar.foreach(_.finishAll())
    rdd.doCheckpoint()
  }
posted @ 2022-05-29 08:01  608088  阅读(34)  评论(0编辑  收藏  举报