Spark 源码系列 - DAGScheduler 触发
目录
结论
action类型的算子,最终由 SparkContext.runJob
触发 DAGScheduler.runJob
action算子最终触发sparkContext.runJob
saveAsTextFile action算子触发
/*
底层调用 SparkHadoopWriter.write
*/
rdd.saveAsTextFile("./data/output")
SparkHadoopWriter -> write
object SparkHadoopWriter extends Logging {
...
def write[K, V: ClassTag](
val ret = sparkContext.runJob(rdd, (context: TaskContext, iter: Iterator[(K, V)]) => {
foreach action算子触发
rdd.foreach(println)
RDD -> foreach
abstract class RDD[T: ClassTag](...) extends Serializable with Logging {
def foreach(f: T => Unit): Unit = withScope {
val cleanF = sc.clean(f)
// sc 是 sparkContext
sc.runJob(this, (iter: Iterator[T]) => iter.foreach(cleanF))
}
sparkContext -> runJob
def runJob[T, U: ClassTag](...)
...
dagScheduler.runJob(rdd, cleanedFunc, partitions, callSite, resultHandler, localProperties.get)
progressBar.foreach(_.finishAll())
rdd.doCheckpoint()
}