• 博客园logo
  • 会员
  • 众包
  • 新闻
  • 博问
  • 闪存
  • 赞助商
  • HarmonyOS
  • Chat2DB
    • 搜索
      所有博客
    • 搜索
      当前博客
  • 写随笔 我的博客 短消息 简洁模式
    用户头像
    我的博客 我的园子 账号设置 会员中心 简洁模式 ... 退出登录
    注册 登录
思想人生从关注生活开始
博客园    首页    新随笔    联系   管理    订阅  订阅

Apache Spark 源代码分析之主节点和工作节点间协作流程

摘要

Spark 是一个高效的分布式计算框架,但想要更深入地学习它,就需要分析 Spark 的源代码,这不仅可以帮助更好地了解 Spark 的工作过程,还可以提高集群的故障排除能力。本文主要关注Spark Master的启动过程和Worker的启动过程。

Master Start

我们通过启动脚本 start-master.sh Shell 命令来启动 Master。脚本开始如下

start-master.sh  -> spark-daemon.sh start org.apache.spark.deploy.master.Master

我们可以看到脚本以 org.apache.spark.deploy.master.Master 类开头。启动时会传入一些参数,比如  cpu execution core, memory size, main method of app等。

查看Master类的main方法内容下面

private[spark] object Master extends Logging {
  val systemName = "sparkMaster"
  private val actorName = "Master"

  //master startup entry
  def main(argStrings: Array[String]) {
    SignalLogger.register(log)
    //Create SparkConf
    val conf = new SparkConf
    //Save parameters to SparkConf
    val args = new MasterArguments(argStrings, conf)
    //Create Actor System and Actor
    val (actorSystem, _, _, _) = startSystemAndActor(args.host, args.port, args.webUiPort, conf)
    //Waiting for the End
    actorSystem.awaitTermination()
  }

这里我们主要看一下startSystemAndActor

  /**
   * Start the Master and return a four tuple of:
   *   (1) The Master actor system
   *   (2) The bound port
   *   (3) The web UI bound port
   *   (4) The REST server bound port, if any
   */
  def startSystemAndActor(
      host: String,
      port: Int,
      webUiPort: Int,
      conf: SparkConf): (ActorSystem, Int, Int, Option[Int]) = {
    val securityMgr = new SecurityManager(conf)

    //Creating ActorSystem with AkkaUtils
    val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port, conf = conf,
      securityManager = securityMgr)

    val actor = actorSystem.actorOf(
      Props(classOf[Master], host, boundPort, webUiPort, securityMgr, conf), actorName)
   ....
  }
}

Spark 下层通讯使用Akka来实现

创建Actor->Actor系统。Actor 先通过 Actor System执行 Master 的构造方法 - >然后执行 Actor 生命周期方法

其中通过执行 Master 的构造函数来初始化部分变量

 private[spark] class Master(
    host: String,
    port: Int,
    webUiPort: Int,
    val securityMgr: SecurityManager,
    val conf: SparkConf)
  extends Actor with ActorLogReceive with Logging with LeaderElectable {
  //primary constructor

  //Enable timer function
  import context.dispatcher   // to use Akka's scheduler.schedule()

  val hadoopConf = SparkHadoopUtil.get.newConfiguration(conf)

  def createDateFormat = new SimpleDateFormat("yyyyMMddHHmmss")  // For application IDs
  //woker timeout
  val WORKER_TIMEOUT = conf.getLong("spark.worker.timeout", 60) * 1000
  val RETAINED_APPLICATIONS = conf.getInt("spark.deploy.retainedApplications", 200)
  val RETAINED_DRIVERS = conf.getInt("spark.deploy.retainedDrivers", 200)
  val REAPER_ITERATIONS = conf.getInt("spark.dead.worker.persistence", 15)
  val RECOVERY_MODE = conf.get("spark.deploy.recoveryMode", "NONE")

  //A HashSet is used to save WorkerInfo
  val workers = new HashSet[WorkerInfo]
  //A HashMap saves workid - > WorkerInfo
  val idToWorker = new HashMap[String, WorkerInfo]
  val addressToWorker = new HashMap[Address, WorkerInfo]

  //A HashSet is used to save tasks submitted by the client (SparkSubmit)
  val apps = new HashSet[ApplicationInfo]
  //A HashMap Appid - "Application Info"
  val idToApp = new HashMap[String, ApplicationInfo]
  val actorToApp = new HashMap[ActorRef, ApplicationInfo]
  val addressToApp = new HashMap[Address, ApplicationInfo]
  //App Waiting for Scheduling
  val waitingApps = new ArrayBuffer[ApplicationInfo]
  val completedApps = new ArrayBuffer[ApplicationInfo]
  var nextAppNumber = 0
  val appIdToUI = new HashMap[String, SparkUI]

  //Save DriverInfo
  val drivers = new HashSet[DriverInfo]
  val completedDrivers = new ArrayBuffer[DriverInfo]
  val waitingDrivers = new ArrayBuffer[DriverInfo] // Drivers currently spooled for scheduling

当主构造函数完成执行时,它会执行 preStart --“并接收方法。

  //Start timer and check timeout worker
  //Focus on CheckForWorkerTime Out
  context.system.scheduler.schedule(0 millis, WORKER_TIMEOUT millis, self, CheckForWorkerTimeOut)

在 preStart 方法中,创建一个计时器来检查 Woker 的超时值 WORKER_TIMEOUT = conf. getLong("spark. worker. timeout", 60)* 1000  默认为 60 秒。

正如我们所看到的,Master 初始化的主要过程是构造一个 Master Actor 来等待消息,初始化一个集合来保存 Worker 信息,并使用计时器检查 Worker 的超时。

Master Start 序列图

Woker Start-up

执行salves.sh - 通过 Shell 脚本>,通过读取slaves 来开启remote worker,并通过 ssh

spark-daemon.sh 启动 org.apache.spark.deploy.worker.worker

该脚本启动 org.apache.spark.deploy.worker.Worker 类

查看工作线程源代码

private[spark] object Worker extends Logging {
  //Worker Start Entry
  def main(argStrings: Array[String]) {
    SignalLogger.register(log)
    val conf = new SparkConf
    val args = new WorkerArguments(argStrings, conf)
    //New Actor System and Actor
    val (actorSystem, _) = startSystemAndActor(args.host, args.port, args.webUiPort, args.cores,
      args.memory, args.masters, args.workDir)
    actorSystem.awaitTermination()
  }

The most important thing here is Woker's Start SystemAndActor.

这里最重要的是Woker的startSystemAndActor

。

  def startSystemAndActor(
      host: String,
      port: Int,
      webUiPort: Int,
      cores: Int,
      memory: Int,
      masterUrls: Array[String],
      workDir: String,
      workerNumber: Option[Int] = None,
      conf: SparkConf = new SparkConf): (ActorSystem, Int) = {

    // The LocalSparkCluster runs multiple local sparkWorkerX actor systems
    val systemName = "sparkWorker" + workerNumber.map(_.toString).getOrElse("")
    val actorName = "Worker"
    val securityMgr = new SecurityManager(conf)
    //Through Akka Utils Actor System
    val (actorSystem, boundPort) = AkkaUtils.createActorSystem(systemName, host, port,
      conf = conf, securityManager = securityMgr)
    val masterAkkaUrls = masterUrls.map(Master.toAkkaUrl(_, AkkaUtils.protocol(actorSystem)))
    //Create Actor Worker-"Execution Constructor-" preStart-"Recice through actorSystem.actorOf
    actorSystem.actorOf(Props(classOf[Worker], host, boundPort, webUiPort, cores, memory,
      masterAkkaUrls, systemName, actorName,  workDir, conf, securityMgr), name = actorName)
    (actorSystem, boundPort)
  }

在这里,Worker 还构造了一个属于 Worker 的 Actor 对象,并且 Worker 启动的初始化就完成了。

Worker 和Master 通信

Worker 的 preStart 方法根据 Actor 生命周期调用

  override def preStart() {
    assert(!registered)
    logInfo("Starting Spark worker %s:%d with %d cores, %s RAM".format(
      host, port, cores, Utils.megabytesToString(memory)))
    logInfo(s"Running Spark version ${org.apache.spark.SPARK_VERSION}")
    logInfo("Spark home: " + sparkHome)
    createWorkDir()
    context.system.eventStream.subscribe(self, classOf[RemotingLifecycleEvent])
    shuffleService.startIfEnabled()
    webUi = new WorkerWebUI(this, workDir, webUiPort)
    webUi.bind()

    //Worker registers with Master
    registerWithMaster()
    ....
  }

这里我们调用 registerWithMaster 方法并开始注册Master。

 def registerWithMaster() {
    // DisassociatedEvent may be triggered multiple times, so don't attempt registration
    // if there are outstanding registration attempts scheduled.
    registrationRetryTimer match {
      case None =>
        registered = false
        //Start registration
        tryRegisterAllMasters()
        ....
    }
  }

 

tryRegisterAllMasters 方法通过在 registerWithMaster匹配结果来调用

  private def tryRegisterAllMasters() {
    //Traversing the address of the master
    for (masterAkkaUrl <- masterAkkaUrls) {
      logInfo("Connecting to master " + masterAkkaUrl + "...")
      //Connect Worker to Mater
      val actor = context.actorSelection(masterAkkaUrl)
      //Send registration information to Master
      actor ! RegisterWorker(workerId, host, port, cores, memory, webUi.boundPort, publicAddress)
    }
  }

通过 master AkkaUrl 和 Master RegisterWorker 建立连接后(workerId、host、port、cores、memory、webUI. boundPort、publicAddress),Worker 向 Master

发送一条消息,其中包含参数、id、host、port、cpu 内核、内存等待

override def receiveWithLogging = {
    ......

    //Accept registration information from Worker
    case RegisterWorker(id, workerHost, workerPort, cores, memory, workerUiPort, publicAddress) =>
    {
      logInfo("Registering worker %s:%d with %d cores, %s RAM".format(
        workerHost, workerPort, cores, Utils.megabytesToString(memory)))
      if (state == RecoveryState.STANDBY) {
        // ignore, don't send response
        //Determine if the worker has been registered
      } else if (idToWorker.contains(id)) {
        //If registered, tell worker that registration failed
        sender ! RegisterWorkerFailed("Duplicate worker ID")
      } else {
        //No registration, encapsulate the registration information from Worker into WorkerInfo
        val worker = new WorkerInfo(id, workerHost, workerPort, cores, memory,
          sender, workerUiPort, publicAddress)
        if (registerWorker(worker)) {
          //Recording Worker's Information with a Persistence Engine
          persistenceEngine.addWorker(worker)
          //Feedback Worker to inform Worker of successful registration
          sender ! RegisteredWorker(masterUrl, masterWebUiUrl)

          schedule()
        } else {
          val workerAddress = worker.actor.path.address
          logWarning("Worker registration failed. Attempted to re-register worker at same " +
            "address: " + workerAddress)
          sender ! RegisterWorkerFailed("Attempted to re-register worker at same address: "
            + workerAddress)
        }
      }
    }

这是主要内容:ReciveWithLogging 轮询消息。当 Master 收到消息时,它会将参数封装为 WorkInfo 对象,将它们添加到集合中,然后将它们添加到持久性引擎中。sender ! RegisteredWorker(masterUrl, masterWebUiUrl)向工作线程发送消息反馈.接下来,查看 worker 的 receiveWithLogging

override def receiveWithLogging = {

    case RegisteredWorker(masterUrl, masterWebUiUrl) =>
      logInfo("Successfully registered with master " + masterUrl)
      registered = true
      changeMaster(masterUrl, masterWebUiUrl)
      //Start the timer and send Heartbeat at regular intervals
      context.system.scheduler.schedule(0 millis, HEARTBEAT_MILLIS millis, self, SendHeartbeat)
      if (CLEANUP_ENABLED) {
        logInfo(s"Worker cleanup enabled; old application directories will be deleted in: $workDir")
        context.system.scheduler.schedule(CLEANUP_INTERVAL_MILLIS millis,
          CLEANUP_INTERVAL_MILLIS millis, self, WorkDirCleanup)
      }

worker 从Master 接收有关注册成功的反馈,启动计时器,并定期发送检测信号。

    case SendHeartbeat =>
      //The purpose of worker sending heartbeat is to report live
      if (connected) { master ! Heartbeat(workerId) }

ReciveWithLogging on Master 接收检测信号消息

  override def receiveWithLogging = {
        ....
    case Heartbeat(workerId) => {
      idToWorker.get(workerId) match {
        case Some(workerInfo) =>
          //Update the last heartbeat time
          workerInfo.lastHeartbeat = System.currentTimeMillis()
          .....
      }
    }
 }

Record and update the last heartbeat time of workerInfo.lastHeartbeat = System.currentTimeMillis()

Master's scheduled tasks constantly send Worker information in a continuous polling set of CheckForWorkerTime Out internal messages, removing Worker information if it exceeds 60 seconds

记录并更新 workerInfo.lastHeartbeat = System.currentTimeMillis() 的上次检测信号时间

Master的计划任务在 CheckForWorkerTimeOut 内部消息的连续轮询集中不断发送工作线程信息,如果工作线程信息超过 60 秒,则删除该信息。

  //Check timeout Worker
    case CheckForWorkerTimeOut => {
      timeOutDeadWorkers()
    }

timeOutDeadWorkers 方法

  def timeOutDeadWorkers() {
    // Copy the workers into an array so we don't modify the hashset while iterating through it
    val currentTime = System.currentTimeMillis()
    val toRemove = workers.filter(_.lastHeartbeat < currentTime - WORKER_TIMEOUT).toArray
    for (worker <- toRemove) {
      if (worker.state != WorkerState.DEAD) {
        logWarning("Removing %s because we got no heartbeat in %d seconds".format(
          worker.id, WORKER_TIMEOUT/1000))
        removeWorker(worker)
      } else {
        if (worker.lastHeartbeat < currentTime - ((REAPER_ITERATIONS + 1) * WORKER_TIMEOUT)) {
          workers -= worker // we've seen this DEAD worker in the UI, etc. for long enough; cull it
        }
      }
    }
  }

 

如果(the last heartbeat time < current time-timeout time)被判断为工作线程超时,并从集合中删除信息。

 case None =>
          if (workers.map(_.id).contains(workerId)) {
            logWarning(s"Got heartbeat from unregistered worker $workerId." +
              " Asking it to re-register.")
            //Send a re-registered message
            sender ! ReconnectWorker(masterUrl)
          } else {
            logWarning(s"Got heartbeat from unregistered worker $workerId." +
              " This worker was never registered, so ignoring the heartbeat.")
          }

Worker 与Master 序列图

 

 



在Master 和Worker 启动后,一般的通信过程就到这里了,然后如何在集群上启动执行器进程计算任务。

posted @ 2014-09-18 11:36  JackYang  阅读(1515)  评论(0)    收藏  举报
刷新页面返回顶部
博客园  ©  2004-2025
浙公网安备 33010602011771号 浙ICP备2021040463号-3