剖析spark cassandra connector是如何限流的

如果将spark内task实时的吞吐量汇报到driver中,实现对spark应用的一个限流,这是非常困难的,成本也是非常高的.

datastax在单个task中实现限流,具体逻辑在RateLimiter

class RateLimiter(
    rate: Long,
    bucketSize: Long,
    time: () => Long = System.currentTimeMillis,
    sleep: Long => Any = Thread.sleep) {

  require(rate > 0, "A positive rate is required")
  require(bucketSize > 0, "A positive bucket size is required")

  private[writer] val bucketFill = new AtomicLong(0L)
  private[writer] val lastTime = new AtomicLong(time()) //Avoid a large initial step

  @tailrec
  private def leak(toLeak: Long): Unit = {
    val fill = bucketFill.get()
    val reallyToLeak = math.min(fill, toLeak)  // we can't leak more than there is now
    if (!bucketFill.compareAndSet(fill, fill - reallyToLeak))
      leak(toLeak)
  }

  private[writer] def leak(): Unit = {
    val currentTime = time()
    val prevTime = lastTime.getAndSet(currentTime)
    val elapsedTime = math.max(currentTime - prevTime, 0L) // Protect against negative time
    leak(elapsedTime * rate / 1000L)
  }

  /** Processes a single packet.
    * If the packet is bigger than the current amount of
    * space available in the bucket, this method will
    * sleep for appropriate amount of time, in order
    * to not exceed the target rate. */
  def maybeSleep(packetSize: Long): Unit = {
    leak()
    val currentFill = bucketFill.addAndGet(packetSize)
    val overflow = currentFill - bucketSize
    val delay = 1000L * overflow / rate
    if (delay > 0L)
      sleep(delay)
  }

}

batchSize/rate得出需要运行的单个批次需要运行的时间t,睡眠 t秒之后,才执行一个批次

 

posted on 2016-08-17 21:00  qiaoshi.wang  阅读(98)  评论(0)    收藏  举报

导航