Spark SQL(7)- physicalPlan 物理计划

spark的物理计划是将得到的逻辑算子树进一步的处理，得到针对RDD的一系列操作的集合，之后提交作业到spark集群。

spark物理计划的生成主要经历了一下的步骤：

应用sparkPlanner中定义的各种策略(strategy)，这里sparkPlanner是SparkStrategy的子类，sparkPlanner中的各种策略都是在SparkStrategy中定义的，这里面还有一个PlanLater类型的代表叶子节点的sparkplan，其主要的作用就是延迟解析，这个在后面应用规则的时候会使用到；
选取一个物理计划，目前直接在返回的迭代器里面获取了第一个元素
准备物理计划，这个过程中会针对拿到的物理计划进行一些分区、排序等方面的处理，这一步也是根据各种各种定义好的规则来进行的。

SparkPlanner简介

sparkPlanner可以理解为物理计划针对RDD操作的描述，一般分为几大类：

LeafExecNode 叶子节点主要和数据源相关，用户创建RDD
UnaryExecNode 一元节点主要是针对RDD的转换操作
BinaryExecNode 二元节点 join操作就属于这类
其他类型的节点

这里面有几个重要的成员变量和方法：

metrics指标信息；
outputPartitioning: Partitioning以及outputOrdering: Seq[SortOrder] 描述了分区和排序的信息，子类都有各自的实现；
execute和doExecute方法，sparkPlanner对外提供了统一的调用触发RDD的方法execute，但是实际子类继承实现的doExecute方法。

sparkPlan物理计划的生成

sparkPlan的生成是在QueryExecution.scala中

lazy val sparkPlan: SparkPlan = {
    SparkSession.setActiveSession(sparkSession)
    // TODO: We use next(), i.e. take the first plan returned by the planner, here for now,
    //       but we will implement to choose the best plan.
    planner.plan(ReturnAnswer(optimizedPlan)).next()
  }

　　这里调用了Queryplan.plan方法：

  def strategies: Seq[GenericStrategy[PhysicalPlan]]

  def plan(plan: LogicalPlan): Iterator[PhysicalPlan] = {
    // Obviously a lot to do here still...

    // Collect physical plan candidates.
    val candidates = strategies.iterator.flatMap(_(plan))

    // The candidates may contain placeholders marked as [[planLater]],
    // so try to replace them by their child plans.
    val plans = candidates.flatMap { candidate =>
      val placeholders = collectPlaceholders(candidate)

      if (placeholders.isEmpty) {
        // Take the candidate as is because it does not contain placeholders.
        Iterator(candidate)
      } else {
        // Plan the logical plan marked as [[planLater]] and replace the placeholders.
        placeholders.iterator.foldLeft(Iterator(candidate)) {
          case (candidatesWithPlaceholders, (placeholder, logicalPlan)) =>
            // Plan the logical plan for the placeholder.
            val childPlans = this.plan(logicalPlan)

            candidatesWithPlaceholders.flatMap { candidateWithPlaceholders =>
              childPlans.map { childPlan =>
                // Replace the placeholder by the child plan
                candidateWithPlaceholders.transformUp {
                  case p if p == placeholder => childPlan
                }
              }
            }
        }
      }
    }

    val pruned = prunePlans(plans)
    assert(pruned.hasNext, s"No plan for $plan")
    pruned
  }

　　QueryPlan有点类似spark逻辑计划阶段的RulExecutor的角色，在QueryPlan同样定义了针对逻辑计划转化为物理计划的规则(strategy)，但是具体的规则策略需要子类实现（sparkPlanner）；因此类似逻辑计划阶段，我们主要关注sparkplanner中重新定义的策略规则即可，但是在此之前上述QueryPlan.plan方法的逻辑还是有必要理一理的这样也利于理解后面的规则，plan其实就是事先定义好的各种规则，但是在应用规则的时候，不一定一次性全部解析完毕，所以需要上面提到的PlanLater节点来占位，这么做之后，在一次迭代之后，会统计PlanLater节点，之后替换成其子节点，继续应用plan方法，直到所有的节点都解析完毕，此时也就相当于物理计划转换完成。下面简单看下sparkPlanner中定义的strategy有哪些：

 override def strategies: Seq[Strategy] =
    experimentalMethods.extraStrategies ++
      extraPlanningStrategies ++ (
      DataSourceV2Strategy :: 
      FileSourceStrategy :: 
      DataSourceStrategy(conf) :: 
      SpecialLimits ::
      Aggregation ::
      JoinSelection ::
      InMemoryScans ::
      BasicOperators :: Nil)

　　这里面根据名字大概能明白策略的含义，数据源相关：DataSourceV2Strategy和DataSourceStrategy(conf) 关于他俩个区别感兴趣的同学可以自己研究下，文件数据源：FileSource，聚合：Aggregation，连接: JoinSelection 等等。

以上之后相当于生成了物理计划，之后获取第一条便是后面计算需要的物理计划了，但是在真正提交之前，还有一步就是prepareForExecution，就是准备的操作，其主要的目的是为了优化物理计划，使之满足shuffle和内部行格式。

 /**
   * Prepares a planned [[SparkPlan]] for execution by inserting shuffle operations and internal
   * row format conversions as needed.
   */
  protected def prepareForExecution(plan: SparkPlan): SparkPlan = {
    preparations.foldLeft(plan) { case (sp, rule) => rule.apply(sp) }
  }

  /** A sequence of rules that will be applied in order to the physical plan before execution. */
  protected def preparations: Seq[Rule[SparkPlan]] = Seq(
    python.ExtractPythonUDFs,
    PlanSubqueries(sparkSession),
    EnsureRequirements(sparkSession.sessionState.conf),
    CollapseCodegenStages(sparkSession.sessionState.conf),
    ReuseExchange(sparkSession.sessionState.conf),
    ReuseSubquery(sparkSession.sessionState.conf))

　　上面的preparations定义了各种准备阶段的规则，有关于python-UDF函数的规则、子查询、确保分区排序准确、代码生成等规则。这里主要看下EnsureRequirements规则，这个规则主要是通过添加Exchange节点保证分区和排序的准确。

private def ensureDistributionAndOrdering(operator: SparkPlan): SparkPlan = {
    val requiredChildDistributions: Seq[Distribution] = operator.requiredChildDistribution
    val requiredChildOrderings: Seq[Seq[SortOrder]] = operator.requiredChildOrdering
    var children: Seq[SparkPlan] = operator.children
    assert(requiredChildDistributions.length == children.length)
    assert(requiredChildOrderings.length == children.length)

    // Ensure that the operator's children satisfy their output distribution requirements.
    children = children.zip(requiredChildDistributions).map {
          // 1、分区数相同、同时为UnspecifiedDistribution或者为alltuple 并且分区数为1
      case (child, distribution) if child.outputPartitioning.satisfies(distribution) =>
        child
      case (child, BroadcastDistribution(mode)) =>
        BroadcastExchangeExec(mode, child)
      case (child, distribution) =>
        val numPartitions = distribution.requiredNumPartitions
          .getOrElse(defaultNumPreShufflePartitions)
        ShuffleExchangeExec(distribution.createPartitioning(numPartitions), child)
    }

    // Get the indexes of children which have specified distribution requirements and need to have
    // same number of partitions.
    val childrenIndexes = requiredChildDistributions.zipWithIndex.filter {
      case (UnspecifiedDistribution, _) => false
      case (_: BroadcastDistribution, _) => false
      case _ => true
    }.map(_._2)

    val childrenNumPartitions =
      childrenIndexes.map(children(_).outputPartitioning.numPartitions).toSet

    if (childrenNumPartitions.size > 1) {
      // Get the number of partitions which is explicitly required by the distributions.
      val requiredNumPartitions = {
        val numPartitionsSet = childrenIndexes.flatMap {
          index => requiredChildDistributions(index).requiredNumPartitions
        }.toSet
        assert(numPartitionsSet.size <= 1,
          s"$operator have incompatible requirements of the number of partitions for its children")
        numPartitionsSet.headOption
      }

      val targetNumPartitions = requiredNumPartitions.getOrElse(childrenNumPartitions.max)

      children = children.zip(requiredChildDistributions).zipWithIndex.map {
        case ((child, distribution), index) if childrenIndexes.contains(index) =>
          if (child.outputPartitioning.numPartitions == targetNumPartitions) {
            child
          } else {
            val defaultPartitioning = distribution.createPartitioning(targetNumPartitions)
            child match {
              // If child is an exchange, we replace it with a new one having defaultPartitioning.
              case ShuffleExchangeExec(_, c, _) => ShuffleExchangeExec(defaultPartitioning, c)
              case _ => ShuffleExchangeExec(defaultPartitioning, child)
            }
          }

        case ((child, _), _) => child
      }
    }

    // Now, we need to add ExchangeCoordinator if necessary.
    // Actually, it is not a good idea to add ExchangeCoordinators while we are adding Exchanges.
    // However, with the way that we plan the query, we do not have a place where we have a
    // global picture of all shuffle dependencies of a post-shuffle stage. So, we add coordinator
    // at here for now.
    // Once we finish https://issues.apache.org/jira/browse/SPARK-10665,
    // we can first add Exchanges and then add coordinator once we have a DAG of query fragments.
    children = withExchangeCoordinator(children, requiredChildDistributions)

    // Now that we've performed any necessary shuffles, add sorts to guarantee output orderings:
    children = children.zip(requiredChildOrderings).map { case (child, requiredOrdering) =>
      // If child.outputOrdering already satisfies the requiredOrdering, we do not need to sort.
      if (SortOrder.orderingSatisfies(child.outputOrdering, requiredOrdering)) {
        child
      } else {
        SortExec(requiredOrdering, global = false, child = child)
      }
    }

    operator.withNewChildren(children)
  }

　　上面的操作主要可以分成下面几步：

添加ExChange节点，遍历子节点，会依次判断子节点的分区方式是否满足所需的数据分布，如果不满足，则考虑是否能以广播的形式来满足，如果不行的话就添加shuffleExChange节点，之后会查看所要求的子节点输出（requiredChildDistributions），是否有特殊需求，并且要求有相同的分区数，针对这类对子节点有特殊需求的情况，则会查看每个子节点的输出分区数目，如果匹配不做改变，不然会添加ShuffleExchangeExec节点。
添加ExchangeCoordinator节点，主要在sql执行的时候进行自适应查询优化;
查看requiredChildOrderings针对排序有特殊需求的添加SortExec节点

这里简述下ExchangeCoordinator：上述2已经在最新的spark源码里面去掉，感兴趣的同学可以下载最新代码研究，这里以ShuffleExchangeExec为例，在注册了ExchangeCoordinator节点之后，会调用doPrepare注册自己到ExchangeCoordinator上，之后在调用doExecute时候，会调用exchangeCoordinator.postShuffleRDD(this)，来拿到对应的shuffleRdd。那么没拿到的时候，协调器则会要求注册的Exchange上报pre-shuffle阶段的信息，根据上报的信息来确定post-shuffle的ShuffleRDD分发。

在经历准备阶段的物理计划之后就可以提交到集群运行了。

总结：到此spark从解析-> 逻辑计划 -> 物理计划阶段，整个大概的流程已经捋清楚了，但是具体的细节，需要在工作中或者平时有机会的时候再细致研究；如果对spark sql 感兴趣的同学可以看看《spark sql 内核剖析》挺不错的，但是需要结合源码来看，不然有的地方比较难理解。后面会再参考这本书和自己的理解整理聚合和连接的脉络。

posted @ 2020-07-29 16:37 刘姥爷观园子阅读(1034) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

刘姥爷观园子

Spark SQL(7)- physicalPlan 物理计划

Spark SQL(7)- physicalPlan 物理计划

SparkPlanner简介

sparkPlan物理计划的生成

公告