(翻译 gafferongames) Sending Large Blocks of Data 发送大块数据

https://gafferongames.com/post/sending_large_blocks_of_data/

In the previous article we implemented packet fragmentation and reassembly so we can send packets larger than MTU.

This approach works great when the data block you’re sending is time critical and can be dropped, but in other cases you need to send large blocks of quickly and reliably over packet loss, and you need the data to get through.

In this situation, a different technique gives much better results.

在上一篇文章中,我们实现了数据包的分片与重组,从而可以发送大于 MTU 的数据包。

这种方法在你发送的数据块对时间非常敏感、允许丢失的情况下效果很好。但在其他情况下,如果你需要在存在数据包丢失的网络中快速且可靠地发送大块数据,并且必须确保数据传输成功,那就需要另一种技术。

在这种情况下,使用不同的技术会带来更好的效果。

Background

It’s common for servers to send large block of data to the client on connect, for example, the initial state of the game world for late join.

Let’s assume this data is 256k in size and the client needs to receive it before they can join the game. The client is stuck behind a load screen waiting for the data, so obviously we want it to be transmitted as quickly as possible.

If we send the data with the technique from the previous article, we get packet loss amplification because a single dropped fragment results in the whole packet being lost. The effect of this is actually quite severe. Our example block split into 256 fragments and sent over 1% packet loss now has a whopping 92.4% chance of being dropped!

Since we just need the data to get across, we have no choice but to keep sending it until it gets through. On average, we have to send the block 10 times before it’s received. You may laugh but this actually happened on a AAA game I worked on!

To fix this, I implemented a new system for sending large blocks, one that handles packet loss by resends fragments until they are acked. Then I took the problematic large blocks and piped them through this system, fixing a bunch of players stalling out on connect, while continuing to send time critical data (snapshots) via packet fragmentation and reassembly.

服务器在客户端连接时发送大块数据是很常见的事情,比如为了补入玩家(late join)发送游戏世界的初始状态。

假设这块数据有 256KB,客户端必须在接收完成后才能加入游戏。客户端在加载画面中等待数据传输完成,因此我们显然希望它尽可能快地传输。

如果我们使用上一篇文章中的技术发送这些数据,就会遇到“丢包放大”问题——只要一个片段丢失,整个数据包就会作废。这个问题其实非常严重。以我们这个例子来说,256KB 被拆分成 256 个片段,假设网络丢包率为 1%,整个数据块被丢弃的概率居然高达 92.4%

由于我们只关心数据能否成功传输过去,那我们就只能不断重发,直到数据最终完整地到达为止。平均下来,我们得重复发送整块数据 10 次 才能成功。听起来可能有点夸张,但这真的在我参与开发的一款 AAA 游戏中发生过!

为了解决这个问题,我实现了一个新的系统,用于发送大块数据。这个系统会在遇到丢包时,持续重发那些未被确认(acked)的片段。然后我将这些有问题的大数据块通过新系统发送,同时继续使用分片与重组的方法来发送那些对时效性要求较高的数据(例如快照)。最终解决了大量玩家在连接时卡住的问题。

 

Chunks and Slices

In this new system blocks of data are called chunks. Chunks are split up into slices. This name change keeps the chunk system terminology (chunks/slices) distinct from packet fragmentation and reassembly (packets/fragments).

The basic idea is that slices are sent over the network repeatedly until they all get through. Since we are implementing this over UDP, simple in concept becomes a little more complicated in implementation because have to build in our own basic reliability system so the sender knows which slices have been received.

This reliability gets quite tricky if we have a bunch of different chunks in flight, so we’re going to make a simplifying assumption up front: we’re only going to send one chunk over the network at a time. This doesn’t mean the sender can’t have a local send queue for chunks, just that in terms of network traffic there’s only ever one chunk in flight at any time.

This makes intuitive sense because the whole point of the chunk system is to send chunks reliably and in-order. If you are for some reason sending chunk 0 and chunk 1 at the same time, what’s the point? You can’t process chunk 1 until chunk 0 comes through, because otherwise it wouldn’t be reliable-ordered.

That said, if you dig a bit deeper you’ll see that sending one chunk at a time does introduce a small trade-off, and that is that it adds a delay of RTT between chunk n being received and the send starting for chunk n+1 from the receiver’s point of view.

This trade-off is totally acceptable for the occasional sending of large chunks like data sent once on client connect, but it’s definitely not acceptable for data sent 10 or 20 times per-second like snapshots. So remember, this system is useful for large, infrequently sent blocks of data, not for time critical data.

在这个新系统中,大块数据被称为 chunks(块),而这些块又被拆分成更小的 slices(切片)。这个命名上的改变是为了将块系统的术语(chunks/slices)与数据包分片与重组的术语(packets/fragments)区分开来。

基本思路是将切片通过网络反复发送,直到全部成功送达。由于我们是在 UDP 上实现这一功能,虽然概念上很简单,但实现起来会稍微复杂一些,因为我们必须构建一个基本的可靠性系统,让发送方知道哪些切片已经被接收。

如果有多个不同的块同时在传输中,这种可靠性机制会变得相当复杂。因此我们先做一个简化假设:我们一次只在网络上传输一个块。这并不意味着发送方不能在本地维护一个发送队列,只是说在网络层面上,同一时间只会有一个块在传输中。

这在直觉上也说得通,因为块系统的目的就是要实现可靠且有序的传输。如果你同时发送块 0 和块 1,那意义何在?在块 0 到达之前你无法处理块 1,否则这就不是可靠有序的传输了。

不过深入一点你会发现,一次只发送一个块会带来一个小的权衡,那就是:从接收方的角度看,从块 n 接收完成到块 n+1 开始发送之间,会产生一个 RTT(往返时延)的延迟。

这个权衡对于偶尔发送的大块数据(比如客户端连接时只发送一次的数据)是完全可以接受的,但对于每秒发送 10 到 20 次的数据(比如快照)来说就绝对不能接受了。所以要记住,这套系统适用于那些体积大但发送频率低的数据块,而不适用于对时间敏感的数据

 

Packet Structure

There are two sides to the chunk system, the sender and the receiver.

The sender is the side that queues up the chunk and sends slices over the network. The receiver is what reads those slice packets and reassembles the chunk on the other side. The receiver is also responsible for communicating back to the sender which slices have been received via acks.

The netcode I work on is usually client/server, and in this case I usually want to be able to send blocks of data from the server to the client and from the client to the server. In that case, there are two senders and two receivers, a sender on the client corresponding to a receiver on the server and vice-versa.

Think of the sender and receiver as end points for this chunk transmission protocol that define the direction of flow. If you want to send chunks in a different direction, or even extend the chunk sender to support peer-to-peer, just add sender and receiver end points for each direction you need to send chunks.

块系统由两个部分组成:发送端(sender)接收端(receiver)

发送端负责将数据块排队,并将切片(slices)通过网络发送出去。

接收端负责读取这些切片数据包,并在另一端将块重新组装起来。接收端还需要向发送端反馈哪些切片已经收到(通过 ack 确认机制)。

我通常所处理的网络代码是客户端/服务器结构(client/server),在这种情况下,我通常需要能够在服务器到客户端、以及客户端到服务器之间双向发送数据块。因此,会存在两个发送端和两个接收端:一个客户端的发送端对应服务器的接收端,反之亦然。

你可以把发送端和接收端视为这套块传输协议的“端点”,它们定义了数据流的方向。如果你希望以不同的方向发送块,甚至想要扩展这个块发送器以支持点对点(P2P)通信,只需为你想要发送块的每个方向增加对应的发送端和接收端即可。

Traffic over the network for this system is sent via two packet types:

  • Slice packet - contains a slice of a chunk up to 1k in size.
  • Ack packet - a bitfield indicating which slices have been received so far.

The slice packet is sent from the sender to the receiver. It is the payload packet that gets the chunk data across the network and is designed so each packet fits neatly under a conservative MTU of 1200 bytes. Each slice is a maximum of 1k and there is a maximum of 256 slices per-chunk, therefore the largest data you can send over the network with this system is 256k.

通过这套系统在网络上传输的数据分为两种数据包类型:

  • Slice 数据包:包含一个最多 1KB 大小的块切片。

  • Ack 数据包:一个位字段(bitfield),用于指示当前为止哪些切片已被成功接收。

Slice 数据包 是由发送端发往接收端的。这是用于实际传输块数据的有效载荷数据包。它的设计确保每个数据包都能稳定地控制在一个保守估计的 1200 字节 MTU 以下。每个切片最大为 1KB,一个块最多可以包含 256 个切片,因此通过这套系统,在网络上传输的最大数据块大小为 256KB

const int SliceSize = 1024;
const int MaxSlicesPerChunk = 256;
const int MaxChunkSize = SliceSize * MaxSlicesPerChunk;

struct SlicePacket : public protocol2::Packet
{
    uint16_t chunkId;
    int sliceId;
    int numSlices;
    int sliceBytes;
    uint8_t data[SliceSize];
 
    template <typename Stream> bool Serialize( Stream & stream )
    {
        serialize_bits( stream, chunkId, 16 );
        serialize_int( stream, sliceId, 0, MaxSlicesPerChunk - 1 );
        serialize_int( stream, numSlices, 1, MaxSlicesPerChunk );
        if ( sliceId == numSlices - 1 )
        {
            serialize_int( stream, sliceBytes, 1, SliceSize );
        }
        else if ( Stream::IsReading )
        {
            sliceBytes = SliceSize;
        }
        serialize_bytes( stream, data, sliceBytes );
        return true;
    }
};

There are two points I’d like to make about the slice packet. The first is that even though there is only ever one chunk in flight over the network, it’s still necessary to include the chunk id (0,1,2,3, etc…) because packets sent over UDP can be received out of order.

Second point. Due to the way chunks are sliced up we know that all slices except the last one must be SliceSize (1024 bytes). We take advantage of this to save a small bit of bandwidth sending the slice size only in the last slice, but there is a trade-off: the receiver doesn’t know the exact size of a chunk until it receives the last slice.

The other packet sent by this system is the ack packet. This packet is sent in the opposite direction, from the receiver back to the sender. This is the reliability part of the chunk network protocol. Its purpose is to lets the sender know which slices have been received.

关于 Slice 数据包,我想强调两点:

第一点是,即使网络中一次只传输一个块,也仍然必须在数据包中包含块 ID(例如 0,1,2,3 等),因为 UDP 协议下的数据包可能会乱序到达。

第二点是,由于我们对块的切片方式,我们可以确定,除了最后一个切片外,其他所有切片的大小都固定为 SliceSize(1024 字节)我们利用这一点来节省一点带宽 —— 只有在最后一个切片中才发送切片的实际大小。不过这样做也有一个权衡:接收方在收到最后一个切片之前,并不知道整个块的实际大小

系统中另一种数据包是 ack 数据包。这个数据包方向相反,是从接收方发回发送方的。它是整个块传输协议中实现可靠性的关键部分,用于让发送方知道哪些切片已经成功被接收

struct AckPacket : public protocol2::Packet 
{ 
    uint16_t chunkId; 
    int numSlices; 
    bool acked[MaxSlicesPerChunk]; 

    bool Serialize( Stream & stream )
    { 
        serialize_bits( stream, chunkId, 16 ); 
        serialize_int( stream, numSlices, 1, MaxSlicesPerChunk ); 
        for ( int i = 0; i < numSlices; ++i ) 
        {
            serialize_bool( stream, acked[i] ); return true; } };
        }
    }
};

Ack 是 “acknowledgments”(确认)的缩写。所以,当接收方发送一个针对切片 100 的 ack 时,意思是“我已经成功接收到切片 100”。这对发送方来说是非常关键的信息,因为:

  1. 它让发送方知道哪些切片已经被接收,从而判断何时可以停止发送

  2. 它还允许发送方更高效地利用带宽——只重发那些尚未被确认(未 ack)的切片。

进一步来看 ack 数据包,一开始可能会觉得它有些多余:为什么每个 ack 包里都要包含对所有切片的确认信息

这是因为 ack 数据包也是通过 UDP 发送的,也可能会丢失。如果只发某一部分确认信息,而这个 ack 包丢了,就可能导致发送方和接收方对“哪些切片已被接收”的状态产生不同步(desync),从而影响整个数据块的正确传输。

我们确实需要对 ack 提供某种程度的可靠性,但我们不想再为 ack 实现一个 ack 系统 —— 那会让事情变得非常复杂且麻烦。

幸运的是,最坏情况下的 ack 位字段也不过是 256 位(32 字节),所以我们就干脆在每个 ack 包中都完整发送全部的 ack 状态。当 ack 包被接收时,只要该包中某个切片被标记为已接收,且本地还未记录为 ack,那我们就立刻标记该切片为 ack

这个“偏向未确认到已确认”的处理方式(就像保险丝一旦烧断就不会再连上),让我们可以很好地应对 ack 包的乱序到达问题。

Sender Implementation

Let’s get started with the implementation of the sender.

The strategy for the sender is:

  • Keep sending slices until all slices are acked
  • Don’t resend slices that have already been acked

We use the following data structure for the sender:

我们现在开始实现发送端。

发送端的策略如下:

  • 持续发送切片,直到所有切片都被确认(acked)

  • 不重发已经被确认的切片

我们为发送端使用如下的数据结构:

class ChunkSender
{
    bool sending;
    uint16_t chunkId;
    int chunkSize;
    int numSlices;
    int numAckedSlices;
    int currentSliceId;
    bool acked[MaxSlicesPerChunk];
    uint8_t chunkData[MaxChunkSize];
    double timeLastSent[MaxSlicesPerChunk];
};

As mentioned before, only one chunk is sent at a time, so there is a ‘sending’ state which is true if we are currently sending a chunk, false if we are in an idle state ready for the user to send a chunk. In this implementation, you can’t send another chunk while the current chunk is still being sent over the network. If you don’t like this, stick a queue in front of the sender.

Next, we have the id of the chunk we are currently sending, or, if we are not sending a chunk, the id of the next chunk to be sent, followed by the size of the chunk and the number of slices it has been split into. We also track, per-slice, whether that slice has been acked, which lets us count the number of slices that have been acked so far while ignoring redundant acks. A chunk is considered fully received from the sender’s point of view when numAckedSlices == numSlices.

We also keep track of the current slice id for the algorithm that determines which slices to send, which works like this. At the start of a chunk send, start at slice id 0 and work from left to right and wrap back around to 0 again when you go past the last slice. Eventually, you stop iterating across because you’ve run out of bandwidth to send slices. At this point, remember our current slice index via current slice id so you can pick up from where you left off next time. This last part is important because it distributes sends across all slices, not just the first few.

如前所述,系统一次只发送一个块,因此我们需要一个“发送中(sending)”状态:当我们正在发送一个块时,该状态为 true;当处于空闲状态、准备好接收用户发送新块时,该状态为 false。在这个实现中,当当前块仍在通过网络发送时,你不能发送另一个块

如果你不喜欢这种限制,可以在发送端前面加一个队列来管理多个待发送的块。

接下来,我们记录当前正在发送的块的 ID,或者在没有正在发送的块时,记录即将发送的下一个块的 ID。之后是该块的总大小,以及它被拆分成的切片数量。

我们还会为每个切片记录其是否已经被确认(acked),这样可以让我们在忽略重复确认的同时,统计目前已被确认的切片数量。

从发送端的角度来看,numAckedSlices == numSlices 时,该块就被视为已经完整接收

我们还需要记录一个当前的切片 ID,用于发送算法中决定要发送哪些切片。这个算法的工作方式如下:

在开始发送一个块时,从切片 ID 0 开始,从左到右依次检查,当超过最后一个切片时回绕回到 0。这个过程会持续进行,直到耗尽可用带宽,无法再继续发送切片为止。

此时,我们会通过当前切片 ID 记录下我们遍历到的位置,这样下次可以从上次中断的地方继续发送。这一步非常关键,因为它能将发送操作平均分布在所有切片上,而不仅仅是集中在前几个切片

Now let’s discuss bandwidth limiting. Obviously you don’t just blast slices out continuously as you’d flood the connection in no time, so how do we limit the sender bandwidth? My implementation works something like this: as you walk across slices and consider each slice you want to send, estimate roughly how many bytes the slice packet will take eg: roughly slice bytes + some overhead for your protocol and UDP/IP header. Then compare the amount of bytes required vs. the available bytes you have to send in your bandwidth budget. If you don’t have enough bytes accumulated, stop. Otherwise, subtract the bytes required to send the slice and repeat the process for the next slice.

Where does the available bytes in the send budget come from? Each frame before you update the chunk sender, take your target bandwidth (eg. 256kbps), convert it to bytes per-second, and add it multiplied by delta time (dt) to an accumulator.

A conservative send rate of 256kbps means you can send 32000 bytes per-second, so add 32000 * dt to the accumulator. A middle ground of 512kbit/sec is 64000 bytes per-second. A more aggressive 1mbit is 125000 bytes per-second. This way each update you accumulate a number of bytes you are allowed to send, and when you’ve sent all the slices you can given that budget, any bytes left over stick around for the next time you try to send a slice.

One subtle point with the chunk sender and is that it’s a good idea to implement some minimum resend delay per-slice, otherwise you get situations where for small chunks, or the last few slices of a chunk that the same few slices get spammed over the network.

For this reason we maintain an array of last send time per-slice. One option for this resend delay is to maintain an estimate of RTT and to only resend a slice if it hasn’t been acked within RTT * 1.25 of its last send time. Or, you could just resend the slice it if it hasn’t been sent in the last 100ms. Works for me!

现在让我们讨论带宽限制。显然,发送端不能一直不停地发送切片,否则很快就会淹没连接。那么,我们如何限制发送带宽呢?

我的实现大致是这样的:在遍历每个切片并决定要发送哪些切片时,首先估算一下每个切片包大概需要多少字节,比如大致的切片字节数加上一些协议和 UDP/IP 头部的开销。然后,将所需的字节数与你可用的带宽字节数进行比较。如果没有足够的字节来发送切片,就停止。如果有足够的字节,则从可用字节中减去所需的字节数,然后继续处理下一个切片。

那么,可用字节数从哪里来呢?每次更新发送器之前,我们会用目标带宽(例如 256kbps)转换为每秒字节数,并将其乘以 delta time(dt),然后加到一个累加器中。

一个保守的发送速率是 256kbps,这意味着每秒可以发送 32000 字节。因此,每帧将 32000 * dt 加到累加器中。一个中等的带宽 512kbps 是每秒 64000 字节。而更激进的 1mbit 则是每秒 125000 字节。这样,每次更新时,累加器中就会存储你允许发送的字节数。当你根据这个预算发送完所有切片后,任何剩余的字节会被保留到下次尝试发送切片时继续使用。

关于块发送器的一个微妙点是,为了避免小块或块的最后几个切片重复不断地被发送,你最好为每个切片实现一些最小重发延迟。否则,就会出现一些切片(特别是块的最后几个切片)被不断地发送的情况。

为此,我们会维护一个每个切片上次发送时间的数组。对于这个重发延迟的一个选项,可以维护一个 RTT 的估算值,并且只有当切片在 RTT * 1.25 的时间内没有被 ack 时,才会重发它。或者,你也可以设定一个简单的规则:如果切片在过去的 100 毫秒内没有被发送,就重发。对我来说,这样做就可以了!

 

Kicking it up a notch

提升一个档次

Do the math you’ll notice it still takes a long time for a 256k chunk to get across: 

做一下计算,你会发现即使如此,一个 256k 的块仍然需要很长时间才能传输完:

  • 1mbps = 2 seconds
  • 512kbps = 4 seconds
  • 256kbps = 8 seconds :(

Which kinda sucks. The whole point here is quickly and reliably. Emphasis on quickly. Wouldn’t it be nice to be able to get the chunk across faster? The typical use case of the chunk system supports this. For example, a large block of data sent down to the client immediately on connect or a block of data that has to get through before the client exits a load screen and starts to play. You want this to be over as quickly as possible and in both cases the user really doesn’t have anything better to do with their bandwidth, so why not use as much of it as possible?

这有点糟糕。这里的核心目标是快速且可靠,重点是快速难道不是希望能更快地传输块数据吗?块系统的典型使用场景就支持这一点。例如,一大块数据在客户端连接时立即发送,或者在客户端退出加载画面并开始游戏前必须传输的一块数据。你希望这个过程尽可能快速地完成,而且在这两种情况下,用户实际上没有更好的方式使用他们的带宽,所以为什么不尽可能多地利用它呢?

One thing I’ve tried in the past with excellent results is an initial burst. Assuming your chunk size isn’t so large, and your chunk sends are infrequent, I can see no reason why you can’t just fire across the entire chunk, all slices of it, in separate packets in one glorious burst of bandwidth, wait 100ms, and then resume the regular bandwidth limited slice sending strategy.

Why does this work? In the case where the user has a good internet connection (some multiple of 10mbps or greater…), the slices get through very quickly indeed. In the situation where the connection is not so great, the burst gets buffered up and most slices will be delivered as quickly as possible limited only by the amount bandwidth available. After this point switching to the regular strategy at a lower rate picks up any slices that didn’t get through the first time.

This seems a bit risky so let me explain. In the case where the user can’t quite support this bandwidth what you’re relying on here is that routers on the Internet strongly prefer to buffer packets rather than discard them at almost any cost. It’s a TCP thing. Normally, I hate this because it induces latency in packet delivery and messes up your game packets which you want delivered as quickly as possible, but in this case it’s good behavior because the player really has nothing else to do but wait for your chunk to get through.

Just don’t go too overboard with the spam or the congestion will persist after your chunk send completes and it will affect your game for the first few seconds. Also, make sure you increase the size of your OS socket buffers on both ends so they are larger than your maximum chunk size (I recommend at least double), otherwise you’ll be dropping slices packets before they even hit the wire.

Finally, I want to be a responsible network citizen here so although I recommend sending all slices once in an initial burst, it’s important for me to mention that I think this really is only appropriate, and only really borderline appropriate behavior for small chunks in the few 100s of k range in 2016, and only when your game isn’t sending anything else that is time-critical.

Please don’t use this burst strategy if your chunk is really large, eg: megabytes of data, because that’s way too big to be relying on the kindness of strangers, AKA. the buffers in the routers between you and your packet’s destination. For this it’s necessary to implement something much smarter. Something adaptive that tries to send data as quickly as it can, but backs off when it detects too much latency and/or packet loss as a result of flooding the connection. Such a system is outside of the scope of this article.

我过去尝试过的一种方法,效果非常好,就是初始的突发发送。假设你的块大小不是特别大,并且块发送的频率不高,我看不出有什么理由不可以直接将整个块——所有切片——通过单独的包一次性以一个壮丽的带宽突发发送出去,等待 100 毫秒,然后再恢复常规的带宽限制切片发送策略。

为什么这样有效?在用户拥有良好互联网连接(例如 10mbps 或更高)的情况下,切片会非常迅速地通过。而在连接不太好的情况下,突发数据会被缓冲,大部分切片会尽可能快地传送,限制仅在于可用带宽。此后,切换到常规策略以较低的速率可以接收那些第一次没能传送成功的切片。

这看起来有点冒险,所以让我解释一下。在用户无法完全支持这种带宽的情况下,你依赖的是互联网上的路由器通常会优先选择缓冲数据包,而不是在任何情况下丢弃它们。这是 TCP 协议的行为。通常,我不喜欢这种做法,因为它会增加数据包的延迟,干扰游戏数据包的快速传输,但在这种情况下,这种行为是好的,因为玩家实际上没有别的事情可做,只能等待你的块数据传输完成。

不过,千万不要过度使用这种突发发送,否则即使块数据发送完成,连接的拥塞也会持续,并且影响游戏的前几秒。还有,确保在两端都增加操作系统套接字的缓冲区,使其比最大块大小要大(我建议至少加倍),否则你可能在数据包到达网络之前就丢失切片数据包。

最后,我想在这里做一个负责任的网络公民提醒,虽然我推荐在初始阶段一次性发送所有切片,但我认为这种行为仅适用于小块数据(几百 KB 的块),在 2016 年以及在你的游戏没有其他时间关键数据发送的情况下,这样做才是边界合适的。

如果你的块数据非常大,比如几兆字节的数据,请不要使用这种突发发送策略,因为依赖路由器之间的缓冲区的行为太冒险。对于这种情况,必须实现更智能的策略:一种自适应系统,它尽可能快速地发送数据,但当检测到由于过度发送造成的延迟和/或数据包丢失时会自动回退。这样的系统超出了本文的范围。

Receiver Implementation

Now that we have the sender all sorted out let’s move on to the reciever. 

As mentioned previously, unlike the packet fragmentation and reassembly system from the previous article, the chunk system only ever has one chunk in flight.

This makes the reciever side of the chunk system much simpler:

接收端实现

现在我们已经解决了发送端的问题,接下来让我们来看接收端。

如前所述,与上一篇文章中的数据包分片和重组系统不同,块系统在任何时候只有一个块在传输中。

这使得块系统的接收端变得更简单:

class ChunkReceiver
{
    bool receiving;
    bool readyToRead;
    uint16_t chunkId;
    int chunkSize;
    int numSlices;
    int numReceivedSlices;
    bool received[MaxSlicesPerChunk];
    uint8_t chunkData[MaxChunkSize];
};

We have a state whether we are currently ‘receiving’ a chunk over the network, plus a ‘readyToRead’ state which indicates that a chunk has received all slices and is ready to be popped off by the user. This is effectively a minimal receive queue of length 1. If you don’t like this, of course you are free to add a queue.

In this data structure we also keep track of chunk size (although it is not known with complete accuracy until the last slice arrives), num slices and num received slices, as well as a received flag per-slice. This per-slice received flag lets us discard packets containing slices we have already received, and count the number of slices received so far (since we may receive the slice multiple times, we only increase this count the first time we receive a particular slice). It’s also used when generating ack packets. The chunk receive is completed from the receiver’s point of view when numReceivedSlices == numSlices.

我们有一个状态来表示我们当前是否正在通过网络“接收”一个块,以及一个“readyToRead”状态,表示该块已经接收到所有切片并且准备好被用户读取。这实际上是一个长度为 1 的最小接收队列。如果你不喜欢这样,当然可以添加一个队列。

在这个数据结构中,我们还跟踪块的大小(尽管直到最后一个切片到达之前,其大小并不完全准确)、切片的数量、已接收的切片数量,以及每个切片的接收标志。这个每个切片的接收标志让我们可以丢弃包含已经接收过的切片的数据包,并计算到目前为止接收到的切片数量(因为我们可能会多次接收到同一个切片,所以只有第一次接收到某个切片时才会增加计数)。它还在生成 ACK 数据包时被使用。当接收端认为块接收完成时,条件是 numReceivedSlices == numSlices

So what does it look like end-to-end receiving a chunk?

那么,接收一个块从头到尾的过程是怎样的呢?

First, the receiver sets up set to start at chunk 0. When the a slice packet comes in over the network matching the chunk id 0, ‘receiving’ flips from false to true, data for that first slice is inserted into ‘chunkData’ at the correct position, numSlices is set to the value in that packet, numReceivedSlices is incremented from 0 -> 1, and the received flag in the array entry corresponding to that slice is set to true.

As the remaining slice packets for the chunk come in, each of them are checked that they match the current chunk id and numSlices that are being received and are ignored if they don’t match. Packets are also ignored if they contain a slice that has already been received. Otherwise, the slice data is copied into the correct place in the chunkData array, numReceivedSlices is incremented and received flag for that slice is set to true.

This process continues until all slices of the chunk are received, at which point the receiver sets receiving to ‘false’ and ‘readyToRead’ to true. While ‘readyToRead’ is true, incoming slice packets are discarded. At this point, the chunk receive packet processing is performed, typically on the same frame. The caller checks ‘do I have a chunk to read?’ and processes the chunk data. All chunk receive data is cleared back to defaults, except chunk id which is incremented from 0 -> 1, and we are ready to receive the next chunk.

首先,接收端设置接收状态,从块 0 开始。当网络上收到一个匹配块 ID 0 的切片数据包时,receiving 状态从 false 转变为 true,该切片的数据被插入到 chunkData 中的正确位置,numSlices 被设置为该数据包中的值,numReceivedSlices 从 0 增加到 1,且对应切片的接收标志被设置为 true

随着剩余切片数据包的到来,每个数据包都会检查它们是否匹配当前接收的块 ID 和 numSlices,如果不匹配则会被忽略。如果数据包包含已经接收过的切片,也会被忽略。否则,切片数据会被复制到 chunkData 数组中的正确位置,numReceivedSlices 增加,并且该切片的接收标志被设置为 true

这个过程会持续进行,直到接收到所有切片为止。此时,接收端将 receiving 状态设置为 false,并将 readyToRead 设置为 true。当 readyToReadtrue 时,后续的切片数据包会被丢弃。此时,块接收的数据包处理通常会在同一帧内完成。调用者检查“我是否有块可以读取?”并处理块数据。所有的块接收数据都会被清空回默认值,除了块 ID,它会从 0 增加到 1,准备接收下一个块。

Conclusion

The chunk system is simple in concept, but the implementation is certainly not. I encourage you to take a close look at the source code for this article for further details.

块系统在概念上很简单,但实现起来确实不容易。我鼓励你仔细查看本文的源代码,了解更多细节。

 

posted @ 2025-04-05 20:48  sun_dust_shadow  阅读(36)  评论(0)    收藏  举报