剖析spark cassandra connector是如何限流的
如果将spark内task实时的吞吐量汇报到driver中,实现对spark应用的一个限流,这是非常困难的,成本也是非常高的.
datastax在单个task中实现限流,具体逻辑在RateLimiter
class RateLimiter(
rate: Long,
bucketSize: Long,
time: () => Long = System.currentTimeMillis,
sleep: Long => Any = Thread.sleep) {
require(rate > 0, "A positive rate is required")
require(bucketSize > 0, "A positive bucket size is required")
private[writer] val bucketFill = new AtomicLong(0L)
private[writer] val lastTime = new AtomicLong(time()) //Avoid a large initial step
@tailrec
private def leak(toLeak: Long): Unit = {
val fill = bucketFill.get()
val reallyToLeak = math.min(fill, toLeak) // we can't leak more than there is now
if (!bucketFill.compareAndSet(fill, fill - reallyToLeak))
leak(toLeak)
}
private[writer] def leak(): Unit = {
val currentTime = time()
val prevTime = lastTime.getAndSet(currentTime)
val elapsedTime = math.max(currentTime - prevTime, 0L) // Protect against negative time
leak(elapsedTime * rate / 1000L)
}
/** Processes a single packet.
* If the packet is bigger than the current amount of
* space available in the bucket, this method will
* sleep for appropriate amount of time, in order
* to not exceed the target rate. */
def maybeSleep(packetSize: Long): Unit = {
leak()
val currentFill = bucketFill.addAndGet(packetSize)
val overflow = currentFill - bucketSize
val delay = 1000L * overflow / rate
if (delay > 0L)
sleep(delay)
}
}
batchSize/rate得出需要运行的单个批次需要运行的时间t,睡眠 t秒之后,才执行一个批次
posted on 2016-08-17 21:00 qiaoshi.wang 阅读(98) 评论(0) 收藏 举报
浙公网安备 33010602011771号