(翻译 https://gafferongames.com/) Reliability and Congestion Avoidance over UDP 通过 UDP 实现可靠性和拥塞避免

The Problem with TCP

Those of you familiar with TCP know that it already has its own concept of connection, reliability-ordering and congestion avoidance, so why are we rewriting our own mini version of TCP on top of UDP?

The issue is that multiplayer action games rely on a steady stream of packets sent at rates of 10 to 30 packets per second, and for the most part, the data contained is these packets is so time sensitive that only the most recent data is useful. This includes data such as player inputs, the position, orientation and velocity of each player character, and the state of physics objects in the world.

The problem with TCP is that it abstracts data delivery as a reliable ordered stream. Because of this, if a packet is lost, TCP has to stop and wait for that packet to be resent. This interrupts the steady stream of packets because more recent packets must wait in a queue until the resent packet arrives, so packets are received in the same order they were sent.

What we need is a different type of reliability. Instead of having all data treated as a reliable ordered stream, we want to send packets at a steady rate and get notified when packets are received by the other computer. This allows time sensitive data to get through without waiting for resent packets, while letting us make our own decision about how to handle packet loss at the application level.

It is not possible to implement a reliability system with these properties using TCP, so we have no choice but to roll our own reliability on top of UDP.

TCP 的问题

熟悉 TCP 的人都知道,它已经具备连接、可靠性、数据排序以及拥塞避免的机制。那么,为什么我们还要在 UDP 之上重新实现一个类似于 TCP 的迷你版本呢?

问题在于,多人动作游戏 依赖于 稳定的、以 10 到 30 个数据包每秒的速率发送的数据流。这些数据大多是 高度时效性 的信息,例如:

  • 玩家输入
  • 角色的位置、朝向和速度
  • 游戏世界中物理对象的状态

TCP 的问题在于,它将数据传输抽象为 可靠、有序的流。因此,如果某个数据包丢失,TCP 必须 停下来等待重传,确保接收端按照发送顺序接收数据包。这种机制会 中断数据流的稳定性,因为即便更新的数据已经到达,它们也必须在队列中等待,直到丢失的数据包被重传并接收。

但在我们的情况下,我们需要的是 另一种可靠性机制

  • 我们希望 数据包能够持续稳定地发送,而不会因等待重传数据包而中断。
  • 我们希望在数据包被对方接收时 收到通知,这样可以决定如何处理丢失的数据包,而不是强制等待重传。

由于 TCP 无法满足这些需求,我们只能在 UDP 之上实现自己的可靠性机制。

 

Sequence Numbers

The goal of our reliability system is simple: we want to know which packets arrive at the other side of the connection.

First we need a way to identify packets.

What if we had added the concept of a “packet id”? Let’s make it an integer value. We could start this at zero then with each packet we send, increase the number by one. The first packet we send would be packet 0, and the 100th packet sent is packet 99.

This is actually quite a common technique. It’s even used in TCP! These packet ids are called sequence numbers. While we’re not going to implement reliability exactly as TCP does, it makes sense to use the same terminology, so we’ll call them sequence numbers from now on.

Since UDP does not guarantee the order of packets, the 100th packet received is not necessarily the 100th packet sent. It follows that we need to insert the sequence number somewhere in the packet, so that the computer at the other side of the connection knows which packet it is.

We already have a simple packet header for the virtual connection from the previous article, so we’ll just add the sequence number in the header like this:

   [uint protocol id]
   [uint sequence]
   (packet data...)

Now when the other computer receives a packet it knows its sequence number according to the computer that sent it.

序列号(Sequence Numbers)

我们的 可靠性系统 目标很简单:我们想知道哪些数据包成功到达了连接的另一端

首先,我们需要一种方法来 标识数据包

如果我们为每个数据包 添加一个“数据包 ID”,它可以是一个整数值。从 0 开始,每发送一个数据包,ID 就 加 1。这样:

  • 第一个发送的数据包的 ID 是 0
  • 第 100 个发送的数据包的 ID 是 99

这其实是一种常见的技术,甚至 TCP 也使用了类似的方法! 在 TCP 中,这些 数据包 ID 被称为 序列号(Sequence Numbers)。虽然我们不会完全按照 TCP 的方式来实现可靠性,但使用相同的术语是有意义的。因此,从现在开始,我们也称这些 ID 为 序列号

为什么需要序列号?

由于 UDP 并不保证数据包的顺序,因此:

  • 接收端收到的第 100 个数据包,不一定是发送端发出的第 100 个数据包
  • 我们必须在数据包中 插入序列号,这样接收端就能知道数据包的顺序

具体实现

在前面文章中,我们已经为 虚拟连接 设计了一个简单的 数据包头部,现在我们只需要在这个头部中 加入序列号,结构如下:

[uint 协议 ID]
[uint 序列号]
(数据包内容...)

这样,当另一台计算机 收到数据包 时,它就能通过 序列号 知道这个数据包是 发送端按顺序发送的第几个数据包

Acks

Now that we can identify packets using sequence numbers, the next step is to let the other side of the connection know which packets we receive.

Logically this is quite simple, we just need to take note of the sequence number of each packet we receive, and send those sequence numbers back to the computer that sent them.

Because we are sending packets continuously between both machines, we can just add the ack to the packet header, just like we did with the sequence number:

    [uint protocol id]
    [uint sequence]
    [uint ack]
    (packet data...)

Our general approach is as follows:

  • Each time we send a packet we increase the local sequence number

  • When we receieve a packet, we check the sequence number of the packet against the sequence number of the most recently received packet, called the remote sequence number. If the packet is more recent, we update the remote sequence to be equal to the sequence number of the packet.

  • When we compose packet headers, the local sequence becomes the sequence number of the packet, and the remote sequence becomes the ack.

This simple ack system works provided that one packet comes in for each packet we send out.

But what if packets clump up such that two packets arrive before we send a packet? We only have space for one ack per-packet, so what do we do?

Now consider the case where one side of the connection is sending packets at a faster rate. If the client sends 30 packets per-second, and the server only sends 10 packets per-second, we need at least 3 acks included in each packet sent from the server.

Let’s make it even more complex! What if the packet containing the ack is lost? The computer that sent the packet would think the packet got lost but it was actually received!

It seems like we need to make our reliability system… more reliable!

确认(Acks)

现在我们可以通过 序列号 识别数据包,下一步就是让连接的另一端 知道我们接收了哪些数据包

逻辑思路

从逻辑上来说,这很简单:

  1. 记录收到的数据包的序列号
  2. 将这些序列号反馈给发送方

由于两台计算机 持续发送数据包,我们可以 直接在数据包头部添加 ack(确认号),类似于之前的 序列号

具体实现

  • 每次发送数据包时,增加本地的序列号
  • 每次接收到数据包时,检查它的序列号,并与最近接收的数据包的序列号(称为 远程序列号)进行比较如果该数据包 比之前收到的更新,就更新 远程序列号
  • 在发送数据包时,将 本地序列号 作为数据包的 sequence,并将 远程序列号 作为 ack(确认号)

可能的问题

上述 ack 机制一发一收 的情况下可以正常工作,但如果发生以下情况呢?

  1. 多个数据包在同一时间到达

    • 如果 个数据包到达,而我们 一次只能 ack 一个,那么会有一个数据包的 ack 没被记录
  2. 发送速率不同

    • 假设客户端每秒 发送 30 个数据包,而服务器每秒 只发送 10 个数据包,那么服务器必须 在每个数据包中包含 3 个 ack,否则会有丢失的确认
  3. ack 数据包丢失

    • 如果一个数据包的 ack 丢失,那么 发送方会误以为它的原始数据包丢失,但实际上接收方已经成功收到了数据

看起来,我们的 可靠性系统本身需要变得更加可靠

Reliable Acks

Here is where we diverge significantly from TCP.

What TCP does is maintain a sliding window where the ack sent is the sequence number of the next packet it expects to receive, in order. If TCP does not receive an ack for a given packet, it stops and resends a packet with that sequence number again. This is exactly the behavior we want to avoid!

In our reliability system, we never resend a packet with a given sequence number. We sequence n exactly once, then we send n+1, n+2 and so on. We never stop and resend packet n if it was lost, we leave it up to the application to compose a new packet containing the data that was lost, if necessary, and this packet gets sent with a new sequence number.

Because we’re doing things differently to TCP, its now possible to have holes in the set of packets we ack, so it is no longer sufficient to just state the sequence number of the most recent packet we have received.

We need to include multiple acks per-packet.

How many acks do we need?

As mentioned previously we have the case where one side of the connection sends packets faster than the other. Let’s assume that the worst case is one side sending no less than 10 packets per-second, while the other sends no more than 30. In this case, the average number of acks we’ll need per-packet is 3, but if packets clump up a bit, we would need more. Let’s say 6-10 worst case.

What about acks that don’t get through because the packet containing the ack is lost?

To solve this, we’re going to use a classic networking strategy of using redundancy to defeat packet loss!

Let’s include 33 acks per-packet, and this isn’t just going to be up to 33, but always 33. So for any given ack we redundantly send it up to 32 additional times, just in case one packet with the ack doesn’t get through!

But how can we possibly fit 33 acks in a packet? At 4 bytes per-ack thats 132 bytes!

The trick is to represent the 32 previous acks before “ack” using a bitfield:

    [uint protocol id]
    [uint sequence]
    [uint ack]
    [uint ack bitfield]
    <em>(packet data...)</em>

We define “ack bitfield” such that each bit corresponds to acks of the 32 sequence numbers before “ack”. So let’s say “ack” is 100. If the first bit of “ack bitfield” is set, then the packet also includes an ack for packet 99. If the second bit is set, then packet 98 is acked. This goes all the way down to the 32nd bit for packet 68.

Our adjusted algorithm looks like this:

  • Each time we send a packet we increase the local sequence number

  • When we receive a packet, we check the sequence number of the packet against the remote sequence number. If the packet sequence is more recent, we update the remote sequence number.

  • When we compose packet headers, the local sequence becomes the sequence number of the packet, and the remote sequence becomes the ack. The ack bitfield is calculated by looking into a queue of up to 33 packets, containing sequence numbers in the range [remote sequence - 32, remote sequence]. We set bit n (in [1,32]) in ack bits to 1 if the sequence number remote sequence - n is in the received queue.

  • Additionally, when a packet is received, ack bitfield is scanned and if bit n is set, then we acknowledge sequence number packet sequence - n, if it has not been acked already.

With this improved algorithm, you would have to lose 100% of packets for more than a second to stop an ack getting through. And of course, it easily handles different send rates and clumped up packet receives.

可靠的 Acks(Reliable Acks)

这里是我们与 TCP 设计思路的最大分歧点。

TCP 的方法(我们要避免的行为)

  • TCP 维护一个滑动窗口(Sliding Window),其中 ack 代表下一个期望接收的 数据包的序列号。
  • 如果 TCP 未收到某个数据包的 ack,它会 停止发送重新发送该数据包,直到收到 ack 为止。

这个 阻塞式重传机制 会导致 数据包延迟,影响实时性,而我们需要 避免这种行为! 🚫

我们的方法(无重传的可靠 Acks)

  1. 每个序列号(sequence)只使用一次绝不重发
  2. 如果某个数据包丢失,我们不重发该包,而是由应用层决定 是否重新发送丢失的数据,并使用新的 序列号 发送新的数据包。
  3. 因为数据包可能丢失,我们不能只 ack 最新接收的序列号,而需要 同时 ack 多个数据包,以提高可靠性。

需要多少个 Acks?

  • 发送速率不同
    • 假设 一端每秒发送 30 个数据包,而 另一端每秒只发送 10 个数据包
    • 在此情况下,我们需要每个数据包至少携带 3 个 acks,以免丢失确认信息。
    • 如果数据包积累(clump up),我们可能需要 6~10 个 acks 才能可靠地确认所有数据。
  • 解决丢包问题
    • 如果包含 ack 的数据包丢失,那么原始数据包虽然被收到,但它的 ack 丢失,导致发送方 误以为数据丢失
    • 解决方案? 采用 冗余(redundancy) 机制!

如何在一个数据包中容纳 33 个 Acks?

每个 ack 占用 4 字节,那么 33 个 ack 将占用 132 字节!这显然太大了,我们不能简单地把这些 ack 放进数据包中。

解决办法我们使用一个 位字段(bitfield) 来表示在 "ack" 前的 32 个 ack

[uint 协议 ID]
[uint 序列号]
[uint ack]
[uint ack 位字段]
(数据包内容...)

如何表示 ack 位字段:

  • ack 位字段 是一个 32 位的位域,每一位对应一个 ack,表示过去 32 个数据包的确认状态。
  • 假设当前 ack100,那么:
    • 第 1 位 表示 99 是否收到。
    • 第 2 位 表示 98 是否收到。
    • 依此类推,到 第 32 位,表示 68 是否收到。

改进的算法:

  1. 每次发送数据包时,增加 本地序列号
  2. 接收数据包时
    • 如果接收到的包的序列号比 远程序列号(remote sequence)更大,则更新 远程序列号
  3. 构造数据包头时
    • 本地序列号 作为数据包的 序列号
    • 远程序列号 作为数据包的 ack
    • ack 位字段 的计算:
      • 检查一个 最大 33 个数据包的队列,该队列包含从 [remote sequence - 32, remote sequence] 范围内的序列号。
      • 如果序列号 remote sequence - n 在已接收的队列中,则在 ack 位字段的第 n 位 设置为 1。
  4. 扫描 ack 位字段
    • 如果某个位为 1,则表示 packet sequence - n 的数据包已收到并应进行 ack 确认(如果尚未确认)。

改进后的可靠性:

  • 如果丢失 100% 的数据包,且丢失持续超过一秒钟,才可能导致一个 ack 没有通过!
  • 该机制有效处理了 不同发送速率数据包堆积 的情况,确保了可靠的 ack 确认。

通过这种方式,系统能够高效地处理多个 ack,并且大幅降低了丢包对连接可靠性的影响

 

Detecting Lost Packets

Now that we know what packets are received by the other side of the connection, how do we detect packet loss?

The trick here is to flip it around and say that if you don’t get an ack for a packet within a certain amount of time, then we consider that packet lost.

Given that we are sending at no more than 30 packets per second, and we are redundantly sending acks roughly 30 times, if you don’t get an ack for a packet within one second, it is very likely that packet was lost.

So we are playing a bit of a trick here, while we can know 100% for sure which packets get through, but we can only be reasonably certain of the set of packets that didn’t arrive.

The implication of this is that any data which you resend using this reliability technique needs to have its own message id so that if you receive it multiple times, you can discard it. This can be done at the application level.

检测丢失的数据包

现在我们已经知道哪些数据包已经被对方接收了,那么如何检测数据包丢失呢?

解决方法是将问题反转:如果在一定时间内没有收到某个数据包的 ack,我们就认为该数据包丢失。

考虑到我们每秒发送的包数不超过 30 个,并且我们每个数据包的 ack 会被冗余地发送大约 30 次,如果在 1 秒钟内没有收到某个数据包的 ack,那么这个数据包很可能已经丢失。

所以我们有些“技巧性”的做法:虽然我们可以 100% 确定哪些数据包已经到达,但我们只能合理确定哪些数据包没有到达。

这意味着,通过这种可靠性技术重发的任何数据,都需要有自己的 消息 ID,以便在收到重复数据包时可以丢弃它。这个可以在应用层进行处理。

 

Handling Sequence Number Wrap-Around

No discussion of sequence numbers and acks would be complete without coverage of sequence number wrap around!

Sequence numbers and acks are 32 bit unsigned integers, so they can represent numbers in the range [0,4294967295]. Thats a very high number! So high that if you sent 30 packets per-second, it would take over four and a half years for the sequence number to wrap back around to zero.

But perhaps you want to save some bandwidth so you shorten your sequence numbers and acks to 16 bit integers. You save 4 bytes per-packet, but now they wrap around in only half an hour.

So how do we handle this wrap around case?

The trick is to realize that if the current sequence number is already very high, and the next sequence number that comes in is very low, then you must have wrapped around. So even though the new sequence number is numerically lower than the current sequence value, it actually represents a more recent packet.

For example, let’s say we encoded sequence numbers in one byte (not recommended btw. :)), then they would wrap around after 255 like this:

    ... 252, 253, 254, 255, 0, 1, 2, 3, ...

To handle this case we need a new function that is aware of the fact that sequence numbers wrap around to zero after 255, so that 0, 1, 2, 3 are considered more recent than 255. Otherwise, our reliability system stops working after you receive packet 255.

Here’s a function for 16 bit sequence numbers:

    inline bool sequence_greater_than( uint16_t s1, uint16_t s2 )
    {
        return ( ( s1 > s2 ) && ( s1 - s2 <= 32768 ) ) || 
               ( ( s1 < s2 ) && ( s2 - s1  > 32768 ) );
    }

This function works by comparing the two numbers and their difference. If their difference is less than 1/2 the maximum sequence number value, then they must be close together - so we just check if one is greater than the other, as usual. However, if they are far apart, their difference will be greater than 1/2 the max sequence, then we paradoxically consider the sequence number more recent if it is less than the current sequence number.

This last bit is what handles the wrap around of sequence numbers transparently, so 0,1,2 are considered more recent than 255.

Make sure you include this in any sequence number processing you do.

处理序列号回绕

在讨论序列号和确认应答(acks)时,怎么能不提到序列号回绕的问题呢?

序列号和确认应答都是 32 位无符号整数,因此它们可以表示的数字范围是 [0, 4294967295]。这是一个非常大的数字!如此之大,以至于如果你每秒发送 30 个数据包,序列号回绕一次需要超过四年半的时间。

但是,假设你想节省带宽,将序列号和确认应答缩短为 16 位整数。这样每个数据包节省了 4 字节,但序列号将在半小时内回绕一次。

那么,如何处理这种回绕情况呢?

解决方法是认识到:如果当前的序列号已经非常大,而下一个序列号非常小,那么一定发生了回绕。因此,即使新的序列号在数值上低于当前的序列号,它实际上表示的是一个更新的数据包。

例如,假设我们将序列号编码为 1 字节(不推荐这么做!),它将在 255 之后回绕,如下所示:

... 252, 253, 254, 255, 0, 1, 2, 3, ...

为了处理这种情况,我们需要一个新的函数,意识到序列号在 255 之后会回绕到 0,这样 0, 1, 2, 3 就会被认为是比 255 更新的。否则,一旦接收到数据包 255,我们的可靠性系统就会停止工作。

这是一个处理 16 位序列号的函数:

inline bool sequence_greater_than( uint16_t s1, uint16_t s2 )
{
    return ( ( s1 > s2 ) && ( s1 - s2 <= 32768 ) ) || 
           ( ( s1 < s2 ) && ( s2 - s1  > 32768 ) );
}

这个函数通过比较两个数字及其差值来工作。如果它们之间的差小于最大序列号值的一半,那么它们应该是接近的 —— 所以我们照常检查哪个更大。但如果它们相差较远,差值大于最大序列号的一半,我们就会反常地认为较小的序列号实际上表示的是更新的数据包。

最后这一部分就是处理序列号回绕的透明方式,因此 0、1、2 被认为是比 255 更新的。

在你进行任何序列号处理时,记得包含这一点。

 

Congestion Avoidance

While we have solved reliability, there is still the question of congestion avoidance. TCP provides congestion avoidance as part of the packet when you get TCP reliability, but UDP has no congestion avoidance whatsoever!

If we just send packets without some sort of flow control, we risk flooding the connection and inducing severe latency (2 seconds plus!) as routers between us and the other computer become congested and buffer up packets. This happens because routers try very hard to deliver all the packets we send, and therefore tend to buffer up packets in a queue before they consider dropping them.

While it would be nice if we could tell the routers that our packets are time sensitive and should be dropped instead of buffered if the router is overloaded, we can’t really do this without rewriting the software for all routers in the world.

Instead, we need to focus on what we can actually do which is to avoid flooding the connection in the first place. We try to avoid sending too much bandwidth in the first place, and then if we detect congestion, we attempt to back off and send even less.

The way to do this is to implement our own basic congestion avoidance algorithm. And I stress basic! Just like reliability, we have no hope of coming up with something as general and robust as TCP’s implementation on the first try, so let’s keep it as simple as possible.

拥塞避免

虽然我们已经解决了可靠性问题,但仍然存在拥塞避免的问题。TCP在提供可靠性的同时,内置了拥塞避免机制,而UDP完全没有拥塞避免功能!

如果我们在没有任何流量控制的情况下发送数据包,我们就有可能淹没连接,导致严重的延迟(2秒以上!),因为我们和对方之间的路由器可能会因为拥塞而将数据包缓存起来。这是因为路由器尽力传递我们发送的所有数据包,因此会在考虑丢弃数据包之前先将它们缓存到队列中。

虽然如果我们能告诉路由器我们的数据包是时间敏感的,在路由器过载时应该丢弃它们而不是缓存,那就太好了,但实际上我们无法做到这一点,因为这需要重新编写全球所有路由器的软件。

因此,我们需要关注我们能够实际做的事情,那就是避免一开始就造成连接的洪水。我们尽量避免发送过多带宽数据,如果我们检测到拥塞,就尝试退避并发送更少的数据。

实现这一目标的方法是实现我们自己的基本拥塞避免算法。我强调,必须保持简单!就像可靠性一样,我们不可能在第一次尝试时就实现像TCP那样通用和强大的实现,因此让我们尽量保持简洁

Measuring Round Trip Time

Since the whole point of congestion avoidance is to avoid flooding the connection and increasing round trip time (RTT), it makes sense that the most important metric as to whether or not we are flooding our connection is the RTT itself.

We need a way to measure the RTT of our connection.

Here is the basic technique:

  • For each packet we send, we add an entry to a queue containing the sequence number of the packet and the time it was sent.

  • Each time we receive an ack, we look up this entry and note the difference in local time between the time we receive the ack, and the time we sent the packet. This is the RTT time for that packet.

  • Because the arrival of packets varies with network jitter, we need to smooth this value to provide something meaningful, so each time we obtain a new RTT we move a percentage of the distance between our current RTT and the packet RTT. 10% seems to work well for me in practice. This is called an exponentially smoothed moving average, and it has the effect of smoothing out noise in the RTT with a low pass filter.

  • To ensure that the sent queue doesn’t grow forever, we discard packets once they have exceeded some maximum expected RTT. As discussed in the previous section on reliability, it is exceptionally likely that any packet not acked within a second was lost, so one second is a good value for this maximum RTT.

Now that we have RTT, we can use it as a metric to drive our congestion avoidance. If RTT gets too large, we send data less frequently, if its within acceptable ranges, we can try sending data more frequently.

测量往返时间(RTT)

由于拥塞避免的主要目的是避免flooding连接并增加往返时间(RTT),因此判断是否发生flooding 连接的最重要指标就是RTT本身。

我们需要一种方法来测量连接的RTT。

基本的技术如下:

对于每个发送的数据包,我们将该数据包的序列号和发送时间添加到一个队列中。

每次收到ack时,我们查找队列中的条目,并记录收到ack的本地时间与发送数据包的时间之间的差值。这就是该数据包的RTT时间。

由于数据包的到达时间会受到网络抖动的影响,我们需要对这个值进行平滑处理,以便得到有意义的结果。因此,每次获得新的RTT时,我们会在当前RTT和数据包RTT之间移动一定百分比的距离。在实际操作中,10%效果很好。这叫做指数平滑移动平均,它通过低通滤波器的效果平滑了RTT中的噪声。

为了确保发送队列不会永远增长,我们会在数据包超过最大预期RTT时丢弃它们。如前所述,如果一个数据包在一秒钟内没有被ack确认,那么它丢失的可能性极高,因此一秒钟是一个合理的最大RTT值。

现在我们有了RTT,可以将其作为衡量标准来驱动我们的拥塞避免。如果RTT过大,我们就减少数据发送频率;如果在可接受的范围内,我们可以尝试增加数据发送频率。

Simple Binary Congestion Avoidance

As discussed before, let’s not get greedy, we’ll implement a very basic congestion avoidance. This congestion avoidance has two modes. Good and bad. I call it simple binary congestion avoidance.

Let’s assume you send packets of a certain size, say 256 bytes. You would like to send these packets 30 times a second, but if conditions are bad, you can drop down to 10 times a second.

So 256 byte packets 30 times a second is around 64kbits/sec, and 10 times a second is roughly 20kbit/sec. There isn’t a broadband network connection in the world that can’t handle at least 20kbit/sec, so we’ll move forward with this assumption. Unlike TCP which is entirely general for any device with any amount of send/recv bandwidth, we’re going to assume a minimum supported bandwidth for devices involved in our connections.

So the basic idea is this. When network conditions are “good” we send 30 packets per-second, and when network conditions are “bad” we drop to 10 packets per-second.

Of course, you can define “good” and “bad” however you like, but I’ve gotten good results considering only RTT. For example if RTT exceeds some threshold (say 250ms) then you know you are probably flooding the connection. Of course, this assumes that nobody would normally exceed 250ms under non-flooding conditions, which is reasonable given our broadband requirement.

How do you switch between good and bad? The algorithm I like to use operates as follows:

  • If you are currently in good mode, and conditions become bad, immediately drop to bad mode

  • If you are in bad mode, and conditions have been good for a specific length of time ’t’, then return to good mode

  • To avoid rapid toggling between good and bad mode, if you drop from good mode to bad in under 10 seconds, double the amount of time ’t’ before bad mode goes back to good. Clamp this at some maximum, say 60 seconds.

  • To avoid punishing good connections when they have short periods of bad behavior, for each 10 seconds the connection is in good mode, halve the time ’t’ before bad mode goes back to good. Clamp this at some minimum like 1 second.

With this algorithm you will rapidly respond to bad conditions and drop your send rate to 10 packets per-second, avoiding flooding of the connection. You’ll also conservatively try out good mode, and persist sending packets at a higher rate of 30 packets per-second, while network conditions are good.

Of course, you can implement much more sophisticated algorithms. Packet loss % can be taken into account as a metric, even the amount of network jitter (time variance in packet acks), not just RTT.

You can also get much more greedy with congestion avoidance, and attempt to discover when you can send data at a much higher bandwidth (eg. LAN), but you have to be very careful! With increased greediness comes more risk that you’ll flood the connection.

 

简单的二进制拥塞避免

如前所述,让我们不要过于贪心,我们将实现一个非常基础的拥塞避免策略。这个拥塞避免有两种模式:好模式和坏模式。我称之为简单的二进制拥塞避免。

假设你发送的每个数据包大小为256字节。你希望每秒发送30个数据包,但如果网络状况不佳,你可以降低到每秒发送10个数据包。

因此,每秒发送256字节的数据包30次大约是64kbit/s,而每秒发送10次大约是20kbit/s。没有一个带宽连接在世界上会无法承受至少20kbit/s的速度,所以我们可以基于这个假设来推进。与TCP不同,TCP适用于任何设备、任何发送/接收带宽,而我们假设连接的设备支持最低的带宽

基本的想法是这样的。当网络状况“良好”时,我们每秒发送30个数据包;而当网络状况“较差”时,我们减少到每秒发送10个数据包。

当然,你可以根据自己的需求定义“好”和“坏”,但我通常只考虑RTT作为判断标准。例如,如果RTT超过某个阈值(比如250毫秒),那么你就知道连接可能已经拥塞。这个假设基于一个合理的前提:在没有拥塞的情况下,RTT通常不会超过250毫秒,尤其是在我们假定的宽带条件下。

如何在良好和较差模式之间切换?我喜欢使用如下的算法:

  1. 如果当前处于良好模式,且条件变差,立即切换到较差模式。
  2. 如果处于较差模式,并且条件已经持续良好了一段时间t,则切换回良好模式。

为了避免在良好和较差模式之间快速切换,如果在不到10秒的时间内从良好模式切换到较差模式,那么增加回到良好模式的时间t,并将其时间加倍。最多将t值限制为60秒。

为了避免惩罚良好的连接,在连接持续良好的每10秒内,减半从较差模式回到良好模式的时间t,并将其最小值限制为1秒。

通过这个算法,你将快速响应恶劣的网络条件,降低发送频率到每秒10个数据包,避免连接的洪泛。当网络状况良好时,你会保守地尝试保持较高的发送频率,每秒30个数据包。

当然,你可以实现更为复杂的算法。例如,丢包率可以作为一个指标,甚至可以考虑网络抖动(数据包ack的时间变异),而不仅仅是RTT。

你也可以更贪心地进行拥塞避免,尝试发现何时可以以更高的带宽发送数据(例如局域网),但要非常小心!越贪心,就越有可能导致连接洪泛的风险。

 

Conclusion

Our new reliability system let’s us send a steady stream of packets and notifies us which packets are received. From this we can infer lost packets, and resend data that didn’t get through if necessary. We also have a simple congestion avoidance system that drops from 30 packets per-second to 10 times a second so we don’t flood the connection. 中文

结论

我们的新可靠性系统使我们能够发送稳定的数据包流,并通知我们哪些数据包已被接收。通过此,我们可以推测丢失的数据包,并在必要时重新发送未成功传输的数据。我们还拥有一个简单的拥塞避免系统,当网络状况不佳时,将发送频率从每秒30个数据包降低到每秒10个数据包,以避免洪泛连接。

 

posted @ 2025-03-21 18:47  sun_dust_shadow  阅读(33)  评论(0)    收藏  举报