(翻译 gafferongames) State Synchronization 状态同步

https://gafferongames.com/post/state_synchronization/

In this article we round out our discussion of networked physics strategies with state synchronization, the third and final strategy in this article series.

在本文中,我们将完成对联网物理策略的讨论,介绍本系列文章中的第三种也是最后一种策略——状态同步(State Synchronization)。

State Synchronization

What is state synchronization? The basic idea is that, somewhat like deterministic lockstep, we run the simulation on both sides but, unlike deterministic lockstep, we don’t just send input, we send both input and state.

状态同步(State Synchronization)

什么是状态同步?其基本思想是:与确定性锁步(Deterministic Lockstep)类似,我们在双方运行相同的模拟。但与确定性锁步不同的是,我们不仅发送输入数据,还同时发送输入和状态数据。

This gives state synchronization interesting properties. Because we send state, we don’t need perfect determinism to stay in sync, and because the simulation runs on both sides, objects continue moving forward between updates.

This lets us approach state synchronization differently to snapshot interpolation. Instead of sending state updates for every object in each packet, we can now send updates for only a few, and if we’re smart about how we select the objects for each packet, we can save bandwidth by concentrating updates on the most important objects.

So what’s the catch? State synchronization is an approximate and lossy synchronization strategy. In practice, this means you’ll spend a lot of time tracking down sources of extrapolation divergence and pops. But other than that, it’s a quick and easy strategy to get started with.

这使得状态同步具有一些有趣的特性。由于我们发送了状态数据,因此不需要完美的确定性来保持同步。此外,由于模拟在双方都在运行,物体可以在更新之间持续移动。

这使我们可以用不同于快照插值的方式来处理状态同步。我们不需要在每个数据包中发送所有对象的状态更新,而是可以仅发送一部分对象的更新。如果我们聪明地选择每个数据包要更新的对象,就能通过将更新集中在最重要的对象上来节省带宽。

那么问题是什么?状态同步是一种近似且有损的同步策略。在实际应用中,这意味着你会花大量时间来查找导致外推偏差和跳变的问题来源。但除此之外,它是一种快速且易于上手的同步策略。

Implementation

Here’s the state sent over the network per-object:

struct StateUpdate
{
    int index;
    vec3f position;
    quat4f orientation;
    vec3f linear_velocity;
    vec3f angular_velocity;
};

Unlike snapshot interpolation, we’re not just sending visual quantities like position and orientation, we’re also sending non-visual state such as linear and angular velocity. Why is this?

The reason is that state synchronization runs the simulation on both sides, so it’s always extrapolating from the last state update applied to each object. If linear and angular velocity aren’t synchronized, this extrapolation is done with incorrect velocities, leading to pops when objects are updated.

While we must send the velocities, there’s no point wasting bandwidth sending (0,0,0) over and over while an object is at rest. We can fix this with a trivial optimization, like so:

与快照插值不同,我们不仅发送位置和朝向等视觉量,还会发送线性速度和角速度等非视觉状态。为什么要这样做?

原因在于,状态同步在双方都运行模拟,因此它始终基于上次应用的状态更新进行外推。如果线性速度和角速度不同步,那么外推时会使用错误的速度,从而导致对象更新时出现突变(pops)。

尽管我们必须发送速度数据,但当对象处于静止状态时,反复发送 (0,0,0) 这样的数据会浪费带宽。我们可以通过一个简单的优化来解决这个问题,例如:

void serialize_state_update( Stream & stream, 
                             int & index, 
                             StateUpdate & state_update )
{
    serialize_int( stream, index, 0, NumCubes - 1 );
    serialize_vector( stream, state_update.position );
    serialize_quaternion( stream, state_update.orientation );
    bool at_rest = stream.IsWriting() ? state_update.AtRest() : false;    
    serialize_bool( stream, at_rest );
    if ( !at_rest )
    {
        serialize_vector( stream, state_update.linear_velocity );
        serialize_vector( stream, state_update.angular_velocity );
    }
    else if ( stream.IsReading() )
    {
        state_update.linear_velocity = vec3f(0,0,0);
        state_update.angular_velocity = vec3f(0,0,0);
    }
}

What you see above is a serialize function. It’s a trick I like to use to unify packet read and write. I like it because it’s expressive while at the same time it’s difficult to desync read and write. You can read more about them here.

上面展示的是一个 serialize 函数。这是我喜欢使用的一种技巧,可以将数据包的读写统一起来。

我喜欢这种方式的原因是,它既表达清晰,又能有效避免读写操作不同步的问题。如果你想了解更多相关内容,可以在这里阅读详细介绍。

Packet Structure

Now let’s look at the overall structure of packets being sent:

const int MaxInputsPerPacket = 32;
const int MaxStateUpdatesPerPacket = 64;

struct Packet
{
    uint32_t sequence;
    Input inputs[MaxInputsPerPacket];
    int num_object_updates;
    StateUpdate state_updates[MaxStateUpdatesPerPacket];
};

First we include a sequence number in each packet so we can determine out of order, lost or duplicate packets. I recommend you run the simulation at the same framerate on both sides (for example 60HZ) and in this case the sequence number can work double duty as the frame number.

Input is included in each packet because it’s needed for extrapolation. Like deterministic lockstep we send multiple redundant inputs so in the case of packet loss it’s very unlikely that an input gets dropped. Unlike deterministic lockstep, if don’t have the next input we don’t stop the simulation and wait for it, we continue extrapolating forward with the last input received.

首先,我们在每个数据包中包含一个序列号,以便检测乱序、丢失或重复的数据包。我建议你在两端以相同的帧率运行模拟(例如 60Hz)。在这种情况下,序列号还可以兼作帧编号。

每个数据包中都会包含输入数据,因为这些数据在外推(extrapolation)时是必需的。与**确定性锁步(deterministic lockstep)**类似,我们会发送多个冗余输入,以减少由于数据包丢失导致输入丢失的可能性。

但与确定性锁步不同的是,如果下一个输入数据丢失,我们不会停止模拟并等待它,而是使用最近接收到的输入继续进行外推。

Next you can see that we only send a maximum of 64 state updates per-packet. Since we have a total of 901 cubes in the simulation so we need some way to select the n most important state updates to include in each packet. We need some sort of prioritization scheme.

To get started each frame walk over all objects in your simulation and calculate their current priority. For example, in the cube simulation I calculate priority for the player cube as 1000000 because I always want it to be included in every packet, and for interacting (red cubes) I give them a higher priority of 100 while at rest objects have priority of 1.

Unfortunately if you just picked objects according to their current priority each frame you’d only ever send red objects while in a katamari ball and white objects on the ground would never get updated. We need to take a slightly different approach, one that prioritizes sending important objects while also distributing updates across all objects in the simulation.

接下来,你可以看到我们在每个数据包中最多只发送 64 个状态更新。由于模拟中总共有 901 个立方体,因此我们需要一种方法来选择每个数据包中包含的 n 个最重要的状态更新。这需要某种优先级排序机制。

为了开始处理这个问题,每一帧遍历模拟中的所有对象,并计算它们的当前优先级。例如,在立方体模拟中,我将玩家立方体的优先级设为 1,000,000因为我希望它始终包含在每个数据包中而对于正在交互的(红色立方体),我给它们分配一个较高的优先级 100,而静止状态的对象优先级仅为 1

然而,如果每一帧仅根据当前优先级选择对象,你会发现,在“黏黏球”玩法中,你只会不断发送红色立方体的状态更新,而地面上的白色立方体永远不会得到更新。

我们需要采取一种稍微不同的方法,这种方法既能优先更新重要对象,同时也能在整个模拟中分布式地发送更新,以确保所有对象都有机会获得同步。

Priority Accumulator

You can do this with a priority accumulator. This is an array of float values, one value per-object, that is remembered from frame to frame. Instead of taking the immediate priority value for the object and sorting on that, each frame we add the current priority for each object to its priority accumulator value then sort objects in order from largest to smallest priority accumulator value. The first n objects in this sorted list are the objects you should send that frame.

你可以通过使用优先级累加器来实现这一点。优先级累加器是一个浮点值数组,每个对象对应一个值,并且这个值会在每一帧之间保存。

而不是直接使用对象的即时优先级进行排序,每一帧我们将当前帧的优先级值加到每个对象的优先级累加器值上,然后按优先级累加器值从大到小对对象进行排序。排序后的前 n 个对象就是这一帧需要发送的对象。

You could just send state updates for all n objects but typically you have some maximum bandwidth you want to support like 256kbit/sec. Respecting this bandwidth limit is easy. Just calculate how large your packet header is and how many bytes of preamble in the packet (sequence, # of objects in packet and so on) and work out conservatively the number of bytes remaining in your packet while staying under your bandwidth target.

你可以为所有 n 个对象发送状态更新,但通常你有一个最大带宽限制,例如 256kbit/sec。遵守这个带宽限制很简单。只需计算你的数据包头部有多大,以及数据包中有多少字节的前导部分(如序列号、数据包中对象的数量等),然后保守地计算出在保持带宽目标下,数据包中剩余的字节数。

Then take the n most important objects according to their priority accumulator values and as you construct the packet, walk these objects in order and measure if their state updates will fit in the packet. If you encounter a state update that doesn’t fit, skip over it and try the next one. After you serialize the packet, reset the priority accumulator to zero for objects that fit but leave the priority accumulator value alone for objects that didn’t. This way objects that don’t fit are first in line to be included in the next packet.

然后,根据它们的优先级累加值,选择 n 个最重要的对象,在构建数据包时,按顺序遍历这些对象,检查它们的状态更新是否能够适应当前的数据包大小。如果遇到一个状态更新无法适应数据包大小,就跳过它,尝试下一个对象。在序列化数据包后,重置那些能够适应的对象的优先级累加器为零,而对于无法适应的对象,保持它们的优先级累加器值不变。这样,那些无法适应的对象就会排在下一次数据包中优先发送的位置。

The desired bandwidth can even be adjusted on the fly. This makes it really easy to adapt state synchronization to changing network conditions, for example if you detect the connection is having difficulty you can reduce the amount of bandwidth sent (congestion avoidance) and the quality of state synchronization scales back automatically. If the network connection seems like it should be able to handle more bandwidth later on then you can raise the bandwidth limit.

所需的带宽甚至可以动态调整。这使得状态同步能够轻松适应不断变化的网络条件。例如,如果检测到连接出现困难,你可以减少发送的带宽(避免拥塞),这样状态同步的质量会自动回落。如果网络连接看起来能够在之后处理更多的带宽,那么你可以提高带宽限制。

Jitter Buffer

The priority accumulator covers the sending side, but on the receiver side there is much you need to do when applying these state updates to ensure that you don’t see divergence and pops in the extrapolation between object updates.

The very first thing you need to consider is that network jitter exists. You don’t have any guarantee that packets you sent nicely spaced out 60 times per-second arrive that way on the other side. What happens in the real world is you’ll typically receive two packets one frame, 0 packets the next, 1, 2, 0 and so on because packets tend to clump up across frames. To handle this situation you need to implement a jitter buffer for your state update packets. If you fail to do this you’ll have a poor quality extrapolation and pops in stacks of objects because objects in different state update packets are slightly out of phase with each other with respect to time.

All you do in a jitter buffer is hold packets before delivering them to the application at the correct time as indicated by the sequence number (frame number) in the packet. The delay you need to hold packets for in this buffer is a much smaller amount of time relative to interpolation delay for snapshot interpolation but it’s the same basic idea. You just need to delay packets just enough (say 4-5 frames @ 60HZ) so that they come out of the buffer properly spaced apart.

优先级累加器处理发送端,但在接收端,你需要做很多工作来应用这些状态更新,以确保在物体更新之间的外推不会出现发散和“弹跳”。

首先需要考虑的是网络抖动的存在。你无法保证你每秒发送的60个包在另一端按预期的间隔到达。实际上,你通常会遇到这样的情况:某一帧接收到两个包,下一帧没有接收到包,再接收到1个、2个、0个包,依此类推,因为包往往会在帧之间聚集。为了处理这种情况,你需要为状态更新包实现一个抖动缓冲区。如果你没有做到这一点,你的外推质量会很差,物体堆叠中会出现“弹跳”,因为不同状态更新包中的物体在时间上稍微不同步

 在抖动缓冲区(Jitter Buffer)中,你的唯一任务就是在将数据包传递给应用程序之前暂存它们,并根据数据包中的序列号(帧号)在正确的时间进行传输。相较于用于快照插值(Snapshot Interpolation)的插值延迟,这种缓冲所需的延迟时间要短得多,但基本原理是相同的。你只需要稍微延迟数据包(比如在 60Hz 下延迟 4-5 帧),确保它们从缓冲区输出时能保持适当的间隔。

Applying State Updates

Once the packet comes out of the jitter how do you apply state updates? My recommendation is that you should snap the physics state hard. This means you apply the values in the state update directly to the simulation.

I recommend against trying to apply some smoothing between the state update and the current state at the simulation level. This may sound counterintuitive but the reason for this is that the simulation extrapolates from the state update so you want to make sure it extrapolates from a valid physics state for that object rather than some smoothed, total bullshit made-up one. This is especially important when you are networking large stacks of objects.

当数据包从抖动缓冲区(Jitter Buffer)出来后,如何应用状态更新?我的建议是直接对物理状态进行硬同步(snap the physics state hard)。也就是说,你应该直接将状态更新中的值应用到模拟中。

我不建议在状态更新和当前模拟状态之间尝试进行平滑处理。这可能听起来有些反直觉,但原因是模拟会从状态更新中进行外推(extrapolate),因此你需要确保它是基于一个有效的物理状态进行外推,而不是某种经过平滑处理的、完全虚假的状态。这一点在处理网络同步的大型物体堆叠时尤为重要。

Surprisingly, without any smoothing the result is already pretty good:

https://gafferongames.com/videos/state_synchronization_uncompressed.mp4

As you can see it’s already looking quite good and barely any bandwidth optimization has been performed. Contrast this with the first video for snapshot interpolation which was at 18mbit/sec and you can see that using the simulation to extrapolate between state updates is a great way to use less bandwidth.

正如你所看到的,目前的效果已经相当不错,而且几乎没有进行任何带宽优化。对比快照插值(Snapshot Interpolation)第一段视频中 18Mbit/sec 的带宽占用,你会发现,利用模拟在状态更新之间进行外推(Extrapolation)是一种有效的方式,可以大幅减少带宽使用。

Of course we can do a lot better than this and each optimization we do lets us squeeze more state updates in the same amount of bandwidth. The next obvious thing we can do is to apply all the standard quantization compression techniques such as bounding and quantizing position, linear and angular velocity value and using the smallest three compression as described in snapshot compression.

当然,我们可以做得更好,每进行一次优化,我们就能在相同的带宽下挤出更多的状态更新。接下来显而易见的优化是应用所有标准的量化压缩技术,比如对位置进行边界限制和量化,线性和角速度值的量化,以及使用如快照压缩中所描述的最小三值压缩技术。

But here it gets a bit more complex. We are extrapolating from those state updates so if we quantize these values over the network then the state that arrives on the right side is slightly different from the left side, leading to a slightly different extrapolation and a pop when the next state update arrives for that object.

但这里变得有些复杂了。我们是从这些状态更新进行外推的,所以如果我们在网络上传输时对这些值进行量化,那么右侧接收到的状态将与左侧稍微不同,这会导致外推结果有所不同,并且当下一个状态更新到达该物体时,可能会出现“跳变”现象。

Quantize Both Sides

The solution is to quantize the state on both sides. This means that on both sides before each simulation step you quantize the entire simulation state as if it had been transmitted over the network. Once this is done the left and right side are both extrapolating from quantized state and their extrapolations are very similar.

解决方案是对两边的状态进行量化。这意味着在每个模拟步骤之前,你需要对整个模拟状态进行量化,就像它是通过网络传输过来的一样。一旦完成这个步骤,左右两边都将在量化后的状态基础上进行外推(extrapolation),这样它们的外推结果就会非常相似。

Because these quantized values are being fed back into the simulation, you’ll find that much more precision is required than snapshot interpolation where they were just visual quantities used for interpolation.

In the cube simulation I found it necessary to have 4096 position values per-meter, up from 512 with snapshot interpolation, and a whopping 15 bits per-quaternion component in smallest three (up from 9). Without this extra precision significant popping occurs because the quantization forces physics objects into penetration with each other, fighting against the simulation which tries to keep the objects out of penetration.

I also found that softening the constraints and reducing the maximum velocity which the simulation used to push apart penetrating objects also helped reduce the amount of popping.

由于这些量化后的值被反馈到模拟中,你会发现比起快照插值(Snapshot Interpolation)时仅用于插值的视觉量,模拟所需的精度要高得多。

在立方体模拟中,我发现需要每米 4096 个位置值,而不是快照插值时的 512 个位置值,并且最小三值中的四元数组件需要 15 位(相比之前的 9 位)。没有这些额外的精度,就会发生显著的跳变现象,因为量化会导致物理对象相互穿透,与模拟试图避免物体穿透的目标相冲突。

我还发现,软化约束并减少模拟中推动穿透物体分开的最大速度,也有助于减少跳变现象的发生。

With quantization applied to both sides you can see the result is perfect once again. It may look visually about the same as the uncompressed version but in fact we’re able to fit many more state updates per-packet into the 256kbit/sec bandwidth limit. This means we are better able to handle packet loss because state updates for each object are sent more rapidly. If a packet is lost, it’s less of a problem because state updates for those objects are being continually included in future packets.

应用了量化之后,你可以看到结果再次变得完美。它在视觉上可能与未压缩版本差不多,但实际上我们能够在 256kbit/sec 的带宽限制内,容纳更多的状态更新每个数据包中。这意味着我们更能应对数据包丢失,因为每个物体的状态更新发送得更频繁。如果一个数据包丢失,也不是什么大问题,因为这些物体的状态更新会不断地包含在未来的数据包中。

Be aware that when a burst of packet loss occurs like 1/4 a second with no packets getting through, and this is inevitable that eventually something like this will happen, you will probably get a different result on the left and the right sides. We have to plan for this. In spite of all effort that we have made to ensure that the extrapolation is as close as possible (quantizing both sides and so on) pops can and will occur if the network stops delivering packets.

请注意,当发生像是 1/4 秒的突发数据包丢失(没有数据包通过)时,这种情况是不可避免的,最终会发生类似的情况,你可能会在左右两边得到不同的结果。我们必须为此做好准备。尽管我们已经尽力确保外推尽可能接近(比如量化两边等),但如果网络停止传输数据包,跳变现象(pops)仍然是可能发生的。

Visual Smoothing

We can cover up these pops with smoothing.

我们可以通过平滑来掩盖这些跳变现象(pops)。

Remember how I said earlier that you should not apply smoothing at the simulation level because it ruins the extrapolation? What we’re going to do for smoothing instead is calculating and maintaining position and orientation error offsets that we reduce over time. Then when we render the cubes in the right side we don’t render them at the simulation position and orientation, we render them at the simulation position + error offset, and orientation * orientation error.

Over time we work to reduce these error offsets back to zero for position error and identity for orientation error. For error reduction I use an exponentially smoothed moving average tending towards zero. So in effect, I multiply the position error offset by some factor each frame (eg. 0.9) until it gets close enough to zero for it to be cleared (thus avoiding denormals). For orientation, I slerp a certain amount (0.1) towards identity each frame, which has the same effect for the orientation error.

The trick to making this all work is that when a state update comes in you take the current simulation position and add the position error to that, and subtract that from the new position, giving the new position error offset which gives an identical result to the current (smoothed) visual position.

记得我之前说过不应该在模拟层面应用平滑处理,因为那样会破坏外推吗?那么我们要做的平滑处理是:计算并保持位置和方向的误差偏移,并随着时间的推移逐渐减少这些偏移。然后,当我们渲染右侧的立方体时,我们不是直接渲染它们在模拟中的位置和方向,而是渲染它们的位置 + 误差偏移,以及方向 * 方向误差。

随着时间的推移,我们努力将这些误差偏移减少到零(对于位置误差)和单位(对于方向误差)。为了减少误差,我使用了一个指数平滑的移动平均,目标是趋近于零。因此,实际上,我每一帧都会将位置误差偏移乘以一个因子(例如 0.9),直到它接近零并被清除(从而避免了非正规数)。对于方向,我每一帧都会进行一定量的球面插值(例如 0.1)朝单位方向插值,这对方向误差有相同的效果。

使这一切工作的诀窍是,当状态更新到来时,你将当前的模拟位置加上位置误差,然后从新位置中减去这个值,得到新的位置误差偏移,这样可以得到与当前(平滑后的)视觉位置相同的结果。

The same process is then applied to the error quaternion (using multiplication by the conjugate instead of subtraction) and this way you effectively calculate on each state update the new position error and orientation error relative to the new state such that the object appears to have not moved at all. Thus state updates are smooth and have no immediate visual effect, and the error reduction smoothes out any error in the extrapolation over time without the player noticing in the common case. 翻译

相同的过程也应用于误差四元数(使用共轭乘法代替减法),通过这种方式,你实际上在每次状态更新时计算新的位移误差和方向误差,这些误差是相对于新状态的,使得物体看起来好像没有移动。因此,状态更新是平滑的,并且没有立即的视觉效果,误差减少会随着时间的推移平滑掉外推中的任何误差,在常见情况下,玩家不会注意到这些变化。

I find that using a single smoothing factor gives unacceptable results. A factor of 0.95 is perfect for small jitters because it smooths out high frequency jitter really well, but at the same time it is too slow for large position errors, like those that happen after multiple seconds of packet loss:

我发现使用单一的平滑因子会导致不可接受的结果。对于小的抖动,0.95 的因子非常合适,因为它能够很好地平滑高频抖动,但与此同时,它对于较大的位置误差(例如在多秒的数据包丢失后发生的误差)来说,平滑速度太慢了。

https://gafferongames.com/videos/state_synchronization_basic_smoothing.mp4

The solution I use is two different scale factors at different error distances, and to make sure the transition is smooth I blend between those two factors linearly according to the amount of positional error that needs to be reduced. In this simulation, having 0.95 for small position errors (25cms or less) while having a tighter blend factor of 0.85 for larger distances (1m error or above) gives a good result. The same strategy works well for orientation using the dot product between the orientation error and the identity matrix. I found that in this case a blend of the same factors between dot 0.1 and 0.5 works well.

我使用的解决方案是在不同的误差距离下采用两个不同的平滑因子,并且为了确保过渡平滑,我根据需要减少的位置误差量,在这两个因子之间线性插值。在这个模拟中,对于较小的位置误差(25厘米或更小)使用 0.95 的因子,而对于较大的误差(1米或以上)使用更紧凑的 0.85 因子,这样可以得到较好的结果。同样的策略也适用于方向误差,通过计算方向误差与单位矩阵的点积。我发现,在这种情况下,使用点积在 0.1 到 0.5 之间的平滑因子组合效果很好。

The end result is smooth error reduction for small position and orientation errors combined with a tight error reduction for large pops. As you can see above you don’t want to drag out correction of these large pops, they need to be fast and so they’re over quickly otherwise they’re really disorienting for players, but at the same time you want to have really smooth error reduction when the error is small hence the adaptive error reduction approach works really well.

最终的结果是,对于小的位移和方向误差,平滑地减少误差,同时对较大的“跳变”进行快速且紧凑的误差减少。正如上面所看到的,你不希望将这些大幅跳变的修正拖延太久,它们需要快速修正,否则会让玩家感到非常迷失方向;但同时,对于较小的误差,你希望有非常平滑的误差减少,因此自适应误差减少方法效果非常好。

https://gafferongames.com/videos/state_synchronization_adaptive_smoothing.mp4

Delta Compression

Even though I would argue the result above is probably good enough already it is possible to improve the synchronization considerably from this point. For example to support a world with larger objects or more objects being interacted with. So lets work through some of those techniques and push this technique as far as it can go.

尽管我认为上面的结果已经足够好了,但从这个基础上仍然可以显著提升同步效果。例如,以支持更大的物体世界,或同时处理更多交互中的物体。因此,我们来探讨一些优化技术,并将这一方法的潜力发挥到极致。

There is an easy compression that can be performed. Instead of encoding absolute position, if it is within a range of the player cube center, encode position as a relative offset to the player center position. In the common cases where bandwidth is high and state updates need to be more frequent (katamari ball) this provides a large win.

有一种简单的压缩方法可以使用。与其编码绝对位置,不如在物体位于玩家立方体中心附近时,将位置编码为相对于玩家中心位置的偏移量。在带宽较高且需要更频繁进行状态更新的常见情况下(例如“块魂”球的情况),这种方法可以带来显著的优化效果。

Next, what if we do want to perform some sort of delta encoding for state synchronization? We can but it’s quite different in this case than it is with snapshots because we’re not including every cube in every packet, so we can’t just track the most recent packet received and say, OK all these state updates in this packet are relative to packet X.

接下来,如果我们确实想对状态同步执行某种增量编码,该怎么办?我们可以这样做,但在这种情况下,它与快照(snapshot)方式有很大不同,因为我们并不是在每个数据包中都包含所有的立方体。因此,我们不能简单地跟踪最近接收到的数据包,然后说:“好,这个数据包中的所有状态更新都是相对于数据包 X 的。”

What you actually have to do is per-object update keep track of the packet that includes the base for that update. You also need to keep track of exactly the set of packets received so that the sender knows which packets are valid bases to encode relative to. This is reasonably complicated and requires a bidirectional ack system over UDP. Such a system is designed for exactly this sort of situation where you need to know exactly which packets definitely got through. You can find a tutorial on how to implement this in this article.

实际上,你需要在每个对象的更新中跟踪包含该更新基准的数据包。同时,你还需要记录接收到的数据包集合,以便发送方知道哪些数据包可以作为有效的参考基准进行相对编码。这一过程相对复杂,并且需要在 UDP 上实现一个双向确认(ACK)系统。这种系统正是为这种需要精确确认哪些数据包成功到达的情况而设计的。你可以在这篇文章中找到关于如何实现该系统的教程。

So assuming that you have an ack system you know with packet sequence numbers get through. What you do then is per-state update write one bit if the update is relative or absolute, if absolute then encode with no base as before, otherwise if relative send the 16 bit sequence number per-state update of the base and then encode relative to the state update data sent in that packet. This adds 1 bit overhead per-update as well as 16 bits to identify the sequence number of the base per-object update. Can we do better?

Yes. In turns out that of course you’re going to have to buffer on the send and receive side to implement this relative encoding and you can’t buffer forever. In fact, if you think about it you can only buffer up a couple of seconds before it becomes impractical and in the common case of moving objects you’re going to be sending the updates for same object frequently (katamari ball) so practically speaking the base sequence will only be from a short time ago.

假设你已经有了一个 ACK 机制,并且能够知道哪些数据包的序列号成功传输。那么,你可以在每个状态更新中写入一个比特位来指示该更新是相对的还是绝对的

  • 如果是绝对的,那么就像之前一样进行编码,不依赖任何基础状态。

  • 如果是相对的,那么就需要发送一个 16 位的序列号,表示该状态更新所依赖的基础状态的序列号,然后基于该基础状态的数据进行相对编码。

这样,每个状态更新都会增加 1 比特的开销用于标记类型,同时每个对象的状态更新还需要额外增加 16 比特来标识基础状态的序列号。

我们能做得更好吗?

是的。实际上,由于这种相对编码需要在发送端和接收端进行缓冲,但我们不可能无限期地缓冲数据。如果仔细思考,你会发现最多只能缓冲几秒钟,否则就变得不切实际。而且,在通常情况下,尤其是对于移动中的物体(比如“Katamari 球”),你会频繁地为同一对象发送状态更新。因此,在实际应用中,所依赖的基础状态的序列号往往只会来自不久前的某个时间点

So instead of sending the 16 bit sequence base per-object, send in the header of the packet the most recent acked packet (from the reliability ack system) and per-object encode the offset of the base sequence relative to that value using 5 bits. This way at 60 packets per-second you can identify an state update with a base half a second ago. Any base older than this is unlikely to provide a good delta encoding anyway because it’s old, so in that case just drop back to absolute encoding for that update.

因此,与其为每个对象发送 16 位的基础序列号,不如在数据包的头部发送最近被确认(ACK)的数据包序列号(通过可靠性 ACK 系统获取)。然后,在每个对象的状态更新中,使用 5 位 来编码相对于该基础序列号的偏移量。

这样,在 每秒 60 个数据包的情况下,可以识别出最长半秒前的状态更新作为基础。如果某个基础状态比这个时间窗口更久远,通常它已经太旧,无法提供良好的增量编码(delta encoding),那么对于该状态更新,直接回退到绝对编码即可。

Now lets look at the type of objects that are going to have these absolute encodings rather than relative. They’re the objects at rest. What can we do to make them as efficient as possible? In the case of the cube simulation one bad result that can occur is that a cube comes to rest (turns grey) and then has its priority lowered significantly. If that very last update with the position of that object is missed due to packet loss, it can take a long time for that object to have its at rest position updated.

We can fix this by tracking objects which have recently come to rest and bumping their priority until an ack comes back for a packet they were sent in. Thus they are sent at an elevated priority compared with normal grey cubes (which are at rest and have not moved) and keep resending at that elevated rate until we know that update has been received, thus “committing” that grey cube to be at rest at the correct position.

现在让我们看看哪些类型的对象会使用绝对编码而不是相对编码。这些对象通常是静止的对象。那么,我们如何让它们的同步尽可能高效呢?

在立方体模拟中,一个可能出现的不良情况是:某个立方体停止运动(变为灰色),其同步优先级被大幅降低如果由于数据包丢失,导致这个对象的最后一次位置更新未能成功传输,那么可能会需要很长时间才能让它的静止位置被正确更新。

我们可以通过跟踪最近刚刚静止的对象来解决这个问题,并提高它们的同步优先级,直到我们收到ACK 确认,证明这个对象的状态已成功传输。因此:

  • 刚刚静止的立方体(变为灰色的立方体)会被赋予比普通静止立方体更高的同步优先级。

  • 持续以较高的速率重新发送它们的位置更新,直到收到 ACK 确认。

  • 一旦收到 ACK,我们就可以确认该立方体已在正确的位置静止,并降低其同步优先级。

这样做可以确保静止的立方体不会因为丢包而停留在错误的位置,从而提升同步的准确性和稳定性。

Conclusion

And that’s really about it for this technique. Without anything fancy it’s already pretty good, and on top of that another order of magnitude improvement is available with delta compression, at the cost of significant complexity!

这就是这种技术的全部内容。即使不使用任何复杂的优化,它已经表现得相当不错了。而在此基础上,如果使用增量压缩(delta compression),还可以再提升一个数量级的效率——但代价是显著增加实现的复杂性!

 

个人总结

这篇文章主要说了状态同步方式和增量压缩策略。

1.同步位置和运动数据,Remote端进行外插值

2.对于大量需要同步的entity,进行同步优先级的归类,并进行优先级累积

3.为了解决Remote端接收数据的抖动问题,我们需要给Remote添加4-5帧的Buffer区域

4.读取Jitter Buffer里的同步数据,并在固定同步时间,直接设置为RemoteEntity的新的Movement信息

5.如果进行Quantize映射去压缩数据,我们可以在Owner端和Remote端进行同样的策略,这样两端尽可能保持一致。但是这个Quantize精度的要求要大于上一篇快照插值(因为我们是外插值方式)

6.视觉上的平滑。 我们第4点设置逻辑帧下的数据position信息,但是在渲染帧还是进行了显示的平滑过渡的,并且可以动态的调整,如果数据差别不大,可以lerp慢一点,如果delta数据差别比较大,lerp的快一点

7.增量压缩。这里我其实有点疑惑,博主给的链接 https://gafferongames.com/post/reliability_and_flow_control/ 也打不开了。

因为我们不是全Entity一个package去同步的,我们是Entity分级同步的。所以每个Entity的增量压缩前,对比的BaseAckId是不一样的。

 

如果Owner针对每个Entity都要发送BaseAckId给Remote的话,数据开销太大

我们可以用增量AckId方式,首先在Owner准备发送的Pack的头部添加最近的AckId,然后每个Entity用偏移量offset去代表自己的BaseAckId,作者给的是5bit的偏移,32位,如果是60HZ的话,可以查询到0.5s,已经足够了。

如果超出的话,说明数据太久了,直接用全量数据了。

需要1bit标记是增量数据还是全量数据。

8.对于状态切换,需要提高同步优先级,确保及时同步。作者的例子是从动态变静态的Box,由于静态Box的优先级比较低,我们需要在状态切换的时刻,进行一次确定性的数据同步。这样避免最后静止时刻的位置没有正确的同步,导致闪烁发生。

posted @ 2025-04-03 00:03  sun_dust_shadow  阅读(64)  评论(0)    收藏  举报