流量控制与RateLimiter
一.背景
如何提高系统的稳定性,简单来说除了加机器外就是服务降级、限流。加机器就是常说的分布式,从整个架构的稳定性角度看,一般SOA每个接口的所能提供的单位时间服务能力是有上限。假如超过服务能力,一般会造成整个接口服务停顿,或者应用挂了,将延迟传递给服务调用方造成整个系统的服务能力丧失。要是对外的公开 API 接口服务,Rate limiting 应该是一个必备的功能,否极有可能被恶意调用导致服务宕掉,所以限流是必要的。这里本文就整理下常见的限流原理及方案。
流量控制更专业的叫法是:流量整形(traffic shaping),典型作用是限制流出某一网络的某一连接的流量与突发,使这类报文以比较均匀的速度向外发送。
二.常用方法
令牌桶(Token Bucket)和漏桶(leaky bucket)是 最常用的两种限流的算法。
1.漏桶算法
漏桶(Leaky Bucket)算法思路很简单,水(请求)先进入到漏桶里,漏桶以一定的速度出水(接口有响应速率),当水流入速度过大会直接溢出(访问频率超过接口响应速率),然后就拒绝请求,可以看出漏桶算法能强行限制数据的传输速率.示意图如下:

可见这里有两个变量,一个是桶的大小,支持流量突发增多时可以存多少的水(burst),另一个是水桶漏洞的大小(rate),在某些情况下,漏桶算法不能够有效地使用网络资源。因为漏桶的漏出速率是固定的参数,所以,即使网络中不存在资源冲突(没有发生拥塞),漏桶算法也不能使某一个单独的流突发到端口速率。因此,漏桶算法对于存在突发特性的流量来说缺乏效率。而令牌桶算法则能够满足这些具有突发特性的流量。通常,漏桶算法与令牌桶算法可以结合起来为网络流量提供更大的控制。
2.令牌桶算法

令牌桶算法(Token Bucket)和 Leaky Bucket 效果一样但方向相反的算法,更加容易理解.随着时间流逝,系统会按恒定1/QPS时间间隔(如果QPS=100,则间隔是10ms)往桶里加入Token(想象和漏洞漏水相反,有个水龙头在不断的加水),如果桶已经满了就不再加了.新请求来临时,会各自拿走一个Token,如果没有Token可拿了就阻塞或者拒绝服务.
令牌桶的另外一个好处是可以方便的改变速度. 一旦需要提高速率,则按需提高放入桶中的令牌的速率. 一般会定时(比如100毫秒)往桶中增加一定数量的令牌, 有些变种算法则实时的计算应该增加的令牌的数量。
三.guava RateLimiter
1.RateLimiter 简介
Google开源工具包Guava提供了限流工具类RateLimiter,该类基于令牌桶算法(Token Bucket)来完成限流,非常易于使用.RateLimiter经常用于限制对一些物理资源或者逻辑资源的访问速率.它支持两种获取permits接口,一种是如果拿不到立刻返回false,一种会阻塞等待一段时间看能不能拿到。
RateLimiter经常用于限制对一些物理资源或者逻辑资源的访问速率。与Semaphore 相比,Semaphore 限制了并发访问的数量而不是使用速率。(注意尽管并发性和速率是紧密相关的,比如参考Little定律)
通过设置许可证的速率来定义RateLimiter。在默认配置下,许可证会在固定的速率下被分配,速率单位是每秒多少个许可证。为了确保维护配置的速率,许可会被平稳地分配,许可之间的延迟会做调整。
可能存在配置一个拥有预热期的RateLimiter 的情况,在这段时间内,每秒分配的许可数会稳定地增长直到达到稳定的速率。
有一点很重要,那就是请求的许可数从来不会影响到请求本身的限制(调用acquire(1) 和调用acquire(1000) 将得到相同的限制效果,如果存在这样的调用的话),但会影响下一次请求的限制,也就是说,如果一个高开销的任务抵达一个空闲的RateLimiter,它会被马上许可,但是下一个请求会经历额外的限制,从而来偿付高开销任务。注意:RateLimiter 并不提供公平性的保证。
2.code
我们要实现一个基于速率的单机流控框架的时候,RateLimiter 是一个完善的核心组件,下面是demo。
package com.bijian.test; import java.util.concurrent.ConcurrentMap; import com.google.common.collect.Maps; import com.google.common.util.concurrent.RateLimiter; public class TrafficShaper { //key-value(serverice,qps) private static final ConcurrentMap<String, Double> resourceMap = Maps.newConcurrentMap(); //userkey-service limiter private static final ConcurrentMap<String, RateLimiter> userresourceLimiterMap = Maps.newConcurrentMap(); static { //init resourceMap.put("aaa", 50.0); } public static void updateResourceQps(String resource, double qps) { resourceMap.put(resource, qps); } public static void removeResource(String resource) { resourceMap.remove(resource); } public static int enter(String resource, String userkey) { long t1 = System.currentTimeMillis(); double qps = resourceMap.get(resource); //服务不限流 if (qps == 0.0) { return 0; } String keyser = resource + userkey; RateLimiter keyserlimiter = userresourceLimiterMap.get(keyser); //if null,new limiter if (keyserlimiter == null) { keyserlimiter = RateLimiter.create(qps); RateLimiter putByOtherThread = userresourceLimiterMap.putIfAbsent(keyser, keyserlimiter); if (putByOtherThread != null) { keyserlimiter = putByOtherThread; } keyserlimiter.setRate(qps); } //tryacquire if (!keyserlimiter.tryAcquire()) { System.out.println("use:" + (System.currentTimeMillis() - t1) + "ms;" + resource + " visited too frequently by key:" + userkey); return 99; } else { System.out.println("use:" + (System.currentTimeMillis() - t1) + "ms;"); return 0; } } public static void main(String[] args) throws InterruptedException { int i = 0; while (true) { i++; long t2 = System.currentTimeMillis(); System.out.println(t2 + ":qq:" + i); int res = TrafficShaper.enter("aaa", "qq"); System.out.println((System.currentTimeMillis() - t2) + ":qq:" + i); if (res == 99) { i = 0; Thread.sleep(1000); } } } }
简单说明下,这里核心方法是enter,入参是两个,分别是服务名称跟用户key.预期效果就是开放API对于用户来说某个服务只允许调用最大次数。
运行结果:
1498910834048:qq:48 use:0ms; 0:qq:48 1498910834048:qq:49 use:0ms; 0:qq:49 1498910834048:qq:50 use:0ms; 0:qq:50 1498910834048:qq:51 use:0ms;aaa visited too frequently by key:qq 0:qq:51 1498910835049:qq:1 use:0ms; 0:qq:1 1498910835049:qq:2 use:0ms; 0:qq:2
3.API接口

4.源码分析
RateLimiter主要源码分析
两个create函数用于构建不同形式的RateLimiter。
public static RateLimiter create(double permitsPerSecond) 用于创建SmoothBursty类型的RateLimiter public static RateLimiter create(double permitsPerSecond,long warmupPeriod,TimeUnit unit) 用于创建SmoothWarmingUp类型的RateLimiter.API注释上比较长,如下:
根据指定的稳定吞吐率和预热期来创建RateLimiter,这里的吞吐率是指每秒多少许可数(通常是指QPS,每秒多少查询),在这段预热时间内,RateLimiter每秒分配的许可数会平稳地增长直到预热期结束时达到其最大速率(只要存在足够请求数来使其饱和)。同样地,如果RateLimiter 在warmupPeriod时间内闲置不用,它将会逐步地返回冷却状态。也就是说,它会像它第一次被创建般经历同样的预热期。返回的RateLimiter 主要用于那些需要预热期的资源,这些资源实际上满足了请求(比如一个远程服务),而不是在稳定(最大)的速率下可以立即被访问的资源。返回的RateLimiter 在冷却状态下启动(即预热期将会紧跟着发生),并且如果被长期闲置不用,它将回到冷却状态。
下面以acquire为例子,看下源码的实现。
/** * Acquires a single permit from this {@code RateLimiter}, blocking until the * request can be granted. Tells the amount of time slept, if any. * * <p>This method is equivalent to {@code acquire(1)}. * * @return time spent sleeping to enforce rate, in seconds; 0.0 if not rate-limited * @since 16.0 (present in 13.0 with {@code void} return type}) */ public double acquire() { return acquire(1); } /** * Acquires the given number of permits from this {@code RateLimiter}, blocking until the * request can be granted. Tells the amount of time slept, if any. * * @param permits the number of permits to acquire * @return time spent sleeping to enforce rate, in seconds; 0.0 if not rate-limited * @throws IllegalArgumentException if the requested number of permits is negative or zero * @since 16.0 (present in 13.0 with {@code void} return type}) */ public double acquire(int permits) { long microsToWait = reserve(permits); stopwatch.sleepMicrosUninterruptibly(microsToWait); return 1.0 * microsToWait / SECONDS.toMicros(1L); } /** * Reserves the given number of permits from this {@code RateLimiter} for future use, returning * the number of microseconds until the reservation can be consumed. * * @return time in microseconds to wait until the resource can be acquired, never negative */ final long reserve(int permits) { checkPermits(permits); synchronized (mutex()) { return reserveAndGetWaitLength(permits, stopwatch.readMicros()); } }
/** * Reserves next ticket and returns the wait time that the caller must wait for. * * @return the required wait time, never negative */ final long reserveAndGetWaitLength(int permits, long nowMicros) { long momentAvailable = reserveEarliestAvailable(permits, nowMicros); return max(momentAvailable - nowMicros, 0); }
/** * Reserves the requested number of permits and returns the time that those permits can be used * (with one caveat). * * @return the time that the permits may be used, or, if the permits may be used immediately, an * arbitrary past or present time */ abstract long reserveEarliestAvailable(int permits, long nowMicros);
这是个抽象接口,我们看下具体实现类SmoothRateLimiter:
@Override final long reserveEarliestAvailable(int requiredPermits, long nowMicros) { resync(nowMicros); //补充令牌 long returnValue = nextFreeTicketMicros; double storedPermitsToSpend = min(requiredPermits, this.storedPermits); //本次请求消耗的令牌数 double freshPermits = requiredPermits - storedPermitsToSpend; long waitMicros = storedPermitsToWaitTime(this.storedPermits, storedPermitsToSpend) + (long) (freshPermits * stableIntervalMicros); this.nextFreeTicketMicros = nextFreeTicketMicros + waitMicros; //计算下次可用时间 this.storedPermits -= storedPermitsToSpend; //消耗令牌 return returnValue; }
/** * Translates a specified portion of our currently stored permits which we want to * spend/acquire, into a throttling time. Conceptually, this evaluates the integral * of the underlying function we use, for the range of * [(storedPermits - permitsToTake), storedPermits]. * * <p>This always holds: {@code 0 <= permitsToTake <= storedPermits} */ abstract long storedPermitsToWaitTime(double storedPermits, double permitsToTake); private void resync(long nowMicros) { //补充令牌数,及更新下次可用令牌毫秒数 // if nextFreeTicket is in the past, resync to now if (nowMicros > nextFreeTicketMicros) { storedPermits = min(maxPermits, storedPermits + (nowMicros - nextFreeTicketMicros) / stableIntervalMicros); nextFreeTicketMicros = nowMicros; } }
对于storedPermitsToWaitTime,这是一个抽象接口。
RateLimiter实际上由两种实现策略,其实现分别见SmoothBursty和SmoothWarmingUp。

a.SmoothBursty
SmoothBursty使用storedPermits不需要额外等待时间。并且默认maxBurstSeconds为1,因此maxPermits为permitsPerSecond,即最多可以存储1秒的剩余令牌,比如QPS=4,则maxPermits=4。
/** * This implements a "bursty" RateLimiter, where storedPermits are translated to * zero throttling. The maximum number of permits that can be saved (when the RateLimiter is * unused) is defined in terms of time, in this sense: if a RateLimiter is 2qps, and this * time is specified as 10 seconds, we can save up to 2 * 10 = 20 permits. */ static final class SmoothBursty extends SmoothRateLimiter { /** The work (permits) of how many seconds can be saved up if this RateLimiter is unused? */ final double maxBurstSeconds; SmoothBursty(SleepingStopwatch stopwatch, double maxBurstSeconds) { super(stopwatch); this.maxBurstSeconds = maxBurstSeconds; } @Override void doSetRate(double permitsPerSecond, double stableIntervalMicros) { double oldMaxPermits = this.maxPermits; maxPermits = maxBurstSeconds * permitsPerSecond; if (oldMaxPermits == Double.POSITIVE_INFINITY) { // if we don't special-case this, we would get storedPermits == NaN, below storedPermits = maxPermits; } else { storedPermits = (oldMaxPermits == 0.0) ? 0.0 // initial state : storedPermits * maxPermits / oldMaxPermits; } } @Override long storedPermitsToWaitTime(double storedPermits, double permitsToTake) { return 0L; } }
RateLimiter 允许某次请求拿走超出剩余令牌数的令牌,但是下一次请求将为此付出代价,一直等到令牌亏空补上,并且桶中有足够本次请求使用的令牌为止。这里面就涉及到一个权衡,是让前一次请求干等到令牌够用才走掉呢,还是让它先走掉后面的请求等一等呢?Guava 的设计者选择的是后者,先把眼前的活干了,后面的事后面再说。这里我看网上举了例子便于理解,以每秒qps=4,头两步消耗4个,剩余存储4个。在第三步的时候之前存储了4个,加上本次的共8个,但是本次请求了10个,所以透支了2个,第四次请求的时候,需要补上2个,等待0.5秒。
(1).t=0,这时候storedPermits=0,请求1个令牌,等待时间=0;
(2).t=1,这时候storedPermits=3,请求3个令牌,等待时间=0;
(3).t=2,这时候storedPermits=4,请求10个令牌,等待时间=0,超前使用了2个令牌;
(4).t=3,这时候storedPermits=0,请求1个令牌,等待时间=0.5;
b.SmoothWarmingUp
static final class SmoothWarmingUp extends SmoothRateLimiter { private final long warmupPeriodMicros; /** * The slope of the line from the stable interval (when permits == 0), to the cold interval * (when permits == maxPermits) */ private double slope; private double halfPermits; SmoothWarmingUp(SleepingStopwatch stopwatch, long warmupPeriod, TimeUnit timeUnit) { super(stopwatch); this.warmupPeriodMicros = timeUnit.toMicros(warmupPeriod); } @Override void doSetRate(double permitsPerSecond, double stableIntervalMicros) { double oldMaxPermits = maxPermits; maxPermits = warmupPeriodMicros / stableIntervalMicros; halfPermits = maxPermits / 2.0; // Stable interval is x, cold is 3x, so on average it's 2x. Double the time -> halve the rate double coldIntervalMicros = stableIntervalMicros * 3.0; slope = (coldIntervalMicros - stableIntervalMicros) / halfPermits; if (oldMaxPermits == Double.POSITIVE_INFINITY) { // if we don't special-case this, we would get storedPermits == NaN, below storedPermits = 0.0; } else { storedPermits = (oldMaxPermits == 0.0) ? maxPermits // initial state is cold : storedPermits * maxPermits / oldMaxPermits; } } @Override long storedPermitsToWaitTime(double storedPermits, double permitsToTake) { double availablePermitsAboveHalf = storedPermits - halfPermits; long micros = 0; // measuring the integral on the right part of the function (the climbing line) if (availablePermitsAboveHalf > 0.0) { double permitsAboveHalfToTake = min(availablePermitsAboveHalf, permitsToTake); micros = (long) (permitsAboveHalfToTake * (permitsToTime(availablePermitsAboveHalf) + permitsToTime(availablePermitsAboveHalf - permitsAboveHalfToTake)) / 2.0); permitsToTake -= permitsAboveHalfToTake; } // measuring the integral on the left part of the function (the horizontal line) micros += (stableIntervalMicros * permitsToTake); return micros; } private double permitsToTime(double permits) { return stableIntervalMicros + permits * slope; } }
maxPermits等于热身(warmup)期间能产生的令牌数,比如QPS=4,warmup为2秒,则maxPermits=8.halfPermits为maxPermits的一半。

这个图还不是很理解,对比上一个实现方式,就是不能透支,需要的资源就等待。这里待测试验证。
四 其他常见实现方式
1.Proxy 层的实现,针对部分 URL 或者 API 接口进行访问频率限制
Nginx 模块
limit_req_zone $binary_remote_addr zone=one:10m rate=1r/s; server { location /search/ { limit_req zone=one burst=5; }
详细参见:ngx_http_limit_req_module
Haproxy 提供的功能
详细参见:Haproxy Rate limit 模块
2.基于Redis 功能的实现
这个在 Redis 官方文档有非常详细的实现。一般适用于所有类型的应用,比如PHP、Python 等等。redis的实现方式可以支持分布式服务的访问频率的集中控制。Redis的频率限制实现方式还适用于在应用中无法状态保存状态的场景。
posted on 2017-07-01 20:43 bijian1013 阅读(993) 评论(0) 收藏 举报
浙公网安备 33010602011771号