(翻译 unity2020.1) Understanding the managed heap

Understanding the managed heap

Another common problem faced by many Unity developers is the unexpected expansion of the managed heap. In Unity, the managed heap expands much more readily than it shrinks. Furthermore, Unity’s garbage collection strategy tends to fragment memory, which can prevent a large heap from shrinking.

另一个许多 Unity 开发者常见的问题是托管堆意外膨胀。在 Unity 中,托管堆的扩张远比收缩来得容易。此外,Unity 的垃圾回收策略往往会导致内存碎片化,这可能会阻止一个已经膨胀的堆缩小。

How the managed heap operates and why it expands  托管如何运作以及为什么扩张

The “managed heap” is a section of memory that is automatically managed by the memory manager of a Project’s scripting runtime (Mono or IL2CPP). All objects created in managed code must be allocated on the managed heap(2) (Note: Strictly speaking, all non-null reference-typed objects and all boxed value-typed objects must be allocated on the managed heap).

“托管堆”是一块由项目脚本运行时(如 Mono 或 IL2CPP)的内存管理器自动管理的内存区域在托管代码中创建的所有对象,必须分配在托管堆上(注:严格来说,所有非 null 的引用类型对象,以及所有被装箱的值类型对象,都必须分配在托管堆中)。

In the above diagram, the white box represents a quantity of memory apportioned to the managed heap, and the colored boxes within it represent data values stored within the managed heap’s memory space. When additional values are needed, more space is allocated from within the managed heap.

The garbage collector runs periodically(3) (Note: The exact timing is platform-dependent). This sweeps through all objects on the heap, marking for deletion any objects that are no longer referenced. Unreferenced objects are then deleted, freeing up memory.

Crucially, Unity’s garbage collection – which uses the Boehm GC algorithm – is non-generational and non-compacting. “Non-generational” means that the GC must sweep through the entire heap when performing a collection pass, and its performance therefore degrades as the heap expands. “Non-compacting” means that objects in memory are not relocated in order to close gaps between objects.

在上面的图示中,白色框表示分配给托管堆的一块内存区域,而其中的彩色框代表存储在托管堆内存空间中的数据值。当需要存储更多数据时,系统会在托管堆中分配更多的内存空间。

垃圾回收器(GC)会周期性地运行(注:具体时间点依赖于平台)。GC 会扫描托管堆中的所有对象,将那些不再被引用的对象标记为可删除。这些无引用对象随后会被删除,从而释放内存空间。

关键在于:Unity 使用的 Boehm GC 算法是**非分代(non-generational)非压缩(non-compacting)**的。

  • “非分代”意味着 GC 每次执行时都需要扫描整个托管堆,因此堆越大,GC 性能就越差。

  • “非压缩”意味着 GC 不会整理内存中的碎片,也不会重新排列对象位置来填补空隙

 

The above diagram shows an example of memory fragmentation. When an object is released, its memory is freed. However, the freed space does not become part of a single large pool of “free memory”. The objects on either side of the freed object may still be in use. Because of this, the freed space is a “gap” between other segments of memory (this gap is indicated by the red circle in the diagram). The newly-freed space can therefore only be used to store data of identical or lesser size than the freed object.

When allocating an object, remember that the object must always occupy a contiguous block of space in memory.

This leads to the core problem of memory fragmentation: while the overall amount of space available in the heap may be substantial, it is possible that some or all of that space is in small “gaps” between allocated objects. In this case, even though there may be enough total space to accommodate a certain allocation, the managed heap cannot find a large enough block of contiguous memory in which to fit the allocation.

上面的图展示了一个内存碎片化的示例。当一个对象被释放时,它所占用的内存空间会被释放。然而,这块被释放的空间不会自动变成一个“统一的大块可用内存”。因为这个对象两边的其他对象可能仍在被使用。

因此,被释放的空间就变成了其他内存段之间的一个“空隙”(在图中用红圈标示)。这块新释放的内存只能用于存放与原对象大小相同或更小的数据

在分配新对象时,必须记住:新对象在内存中必须占用一整块连续的空间

这就引出了内存碎片化的核心问题:尽管堆中总的可用空间可能还很大,但这些空间可能都被分散成了许多小“空隙”,夹杂在已分配对象之间。在这种情况下,即使总容量足够,托管堆也可能找不到一块足够大的“连续内存”,来容纳新对象,从而导致分配失败或进一步扩张堆空间。

However, if a large object is allocated and there is insufficient contiguous free space to accommodate the object, as illustrated above, the Unity memory manager performs two operations.

First, if it has not already done so, the garbage collector runs. This attempts to free up enough space to fulfill the allocation request.

If, after the GC runs, there is still not enough contiguous space to fit the requested amount of memory, the heap must expand. The specific amount that the heap expands is platform-dependent; however, most Unity platforms double the size of the managed heap.

然而,如果分配一个大对象时,没有足够的连续空闲空间来容纳这个对象(如上图所示的情况),Unity 的内存管理器会执行两个操作:

首先,如果尚未执行过,垃圾回收器(GC)会运行一次,尝试释放足够的空间以满足此次内存分配请求。

如果在 GC 执行之后,仍然没有足够的连续空间来容纳所需的内存块,那么托管堆就必须扩张。堆扩张的具体大小取决于平台;不过,在大多数 Unity 平台上,托管堆的大小通常会翻倍扩展

 

Key problems with the heap

The core issues with managed heap expansion are twofold:

  • Unity does not often release the memory pages allocated to the managed heap when it expands; it optimistically retains the expanded heap, even if a large portion of it is empty. This is to prevent the need to re-expand the heap should further large allocations occur.

  • On most platforms, Unity eventually releases the pages used by empty portions of the managed heap back to the operating system. The interval at which this occurs is not guaranteed and should not be relied upon.

  • The address space used by the managed heap is never returned to the operating system.

  • For 32-bit programs, this can lead to address space exhaustion if the managed heap expands and contracts many times. If a program’s available memory address space is exhausted, the operating system will terminate the program.

  • For 64-bit programs, the address space is sufficiently large that this is extremely unlikely to occur for programs whose running time does not exceed the average human lifespan.

托管堆扩张的核心问题主要有两点:

Unity 在托管堆扩展时,通常不会释放已分配的内存页;即使其中大部分为空,Unity 也会乐观地保留扩展后的堆空间。这是为了在将来需要进行大规模分配时,避免再次扩展堆的开销。

在大多数平台上,Unity 最终会将托管堆中空闲部分所使用的内存页返还给操作系统。但这个释放的间隔时间不固定不应依赖

托管堆所使用的地址空间永远不会被归还给操作系统。

对于 32 位程序来说,如果托管堆频繁扩展和收缩,可能会导致地址空间耗尽。一旦程序的可用内存地址空间耗尽,操作系统将终止该程序。

而对于 64 位程序来说,地址空间足够大,在程序运行时间不超过人类平均寿命的情况下,几乎不可能发生地址空间耗尽的问题。

 

Temporary allocations 临时分配

Many Unity projects are found to operate with several tens or hundreds of kilobytes of temporary data being allocated to the managed heap each frame. This is often extremely detrimental to a project’s performance. Consider the following math:

If a program allocates one kilobyte (1kb) of temporary memory each frame, and is running at 60 frames per second
, then it must allocate 60 kilobytes of temporary memory per second. Over the course of a minute, this adds up to 3.6 megabytes of garbage in memory. Invoking the garbage collector once per second is likely to be detrimental to performance, but allocating 3.6 megabytes per minute is problematic when attempting to run on low-memory devices.

Further, consider loading operations. If a large number of temporary objects are generated during a heavy Asset-loading operation, and those objects are referenced until the operation completes, then the garbage collector is unable to release those temporary objects and the managed heap needs to expand – even though many of the objects it contains will be released a short time later.

许多 Unity 项目在运行过程中,每一帧都会向托管堆分配数十或数百 KB 的临时数据。这种做法通常会严重影响项目的性能。我们可以通过以下计算来说明这个问题:

如果一个程序在每一帧分配 1KB 的临时内存,并且以 60 帧每秒运行,那么它每秒就会分配 60KB 的临时内存。以此推算,一分钟之内,就会产生 3.6MB 的垃圾内存。

如果每秒触发一次垃圾回收器(GC),可能会对性能造成负面影响。而即使不频繁触发 GC,每分钟生成 3.6MB 的垃圾,也会成为在低内存设备上运行时的一个严重问题。

此外,还要考虑加载操作的情况。如果在执行一次大量资源加载的过程中,产生了大量临时对象,并且这些对象会被引用直到加载完成,那么垃圾回收器在加载完成前无法释放这些临时对象。此时,为了继续分配内存,托管堆就必须扩展——尽管堆中包含的大量对象会在不久之后被释放

 

Keeping track of managed memory allocations is relatively simple. In Unity’s CPU Profiler 
, the Overview has a “GC Alloc” column. This column displays the number of bytes allocated on the managed heap in a specific frame (4) (Note: Note that this is not identical to the number of bytes temporarily allocated during a given frame. The profile displays the number of bytes allocated in a specific frame, even if some/all of the allocated memory is reused in subsequent frames). With the “Deep Profiling” option enabled, it’s possible to track down the method in which these allocations occur.

The Unity Profiler does not track these allocations when they occur off the main thread. Therefore, the “GC Alloc” column cannot be used to measure managed allocations that occur in user-created threads. Switch the execution of code from separate threads to the main thread for debugging purposes or use the BeginThreadProfiling API to display the samples in the Timeline Profiler.

Always profile managed allocations with a development build
 on the target device.

Note that some script methods cause allocations when running in the Editor, but do not produce allocations after the project has been built. GetComponent is the most common example; this method always allocates when executed in the Editor, but not in a built project.

In general, it is strongly recommended that all developers minimize managed heap allocations whenever the project is in an interactive state. Allocations during non-interactive operations, such as Scene
 loading, are less problematic.

The Jetbrains Resharper Plugin for Visual Studio can help locate allocations in code.

Use Unity’s Deep Profile mode to locate the specific causes of managed allocations. In Deep Profile mode, all method calls are recorded individually, providing a clearer view of where managed allocations occur within the method call tree. Note that Deep Profile mode works not only in the Editor but also on Android and Desktop using the command line argument -deepprofiling. The Deep Profiler button stays grayed out during profiling.

追踪托管内存分配相对简单

在 Unity 的 CPU Profiler 中,概览(Overview)部分包含一个名为 “GC Alloc” 的列。该列显示了在特定帧中,在托管堆上分配的字节数。
(注:请注意,这
并不等同于
在某一帧中临时分配的字节数。Profiler 显示的是在该帧中发生的分配字节数,即使这些内存中的一部分或全部在后续帧中被重用。)

启用 “Deep Profiling(深度分析)” 选项后,可以进一步追踪这些分配是在哪个方法中发生的。

Unity Profiler 不会追踪发生在主线程之外的这些分配。因此,“GC Alloc” 列无法用于测量用户创建的线程中发生的托管内存分配。
为调试目的,可将相关代码从子线程切换到主线程执行,或使用 BeginThreadProfiling API,将这些采样数据显示在 Timeline Profiler 中。

请始终在目标设备上使用 Development Build(开发构建) 来分析托管内存分配。

 

需要注意,有些脚本方法在编辑器中会导致内存分配,但在项目构建之后不会产生这些分配。
最常见的例子是 GetComponent:在编辑器中调用该方法始终会产生内存分配,而在构建版本中不会(这里说的是“始终”,第一次还是会的)

 

通常强烈建议:当项目处于交互状态时(如运行时、用户操作中),所有开发者都应尽可能减少托管堆内存的分配。
而在非交互操作期间(如场景加载期间)进行的内存分配问题相对较小。

 

  • JetBrains 为 Visual Studio 提供的 ReSharper 插件 可以帮助定位代码中的内存分配。

  • 使用 Unity 的 Deep Profile 模式 可以找出托管内存分配的具体来源。在 Deep Profile 模式下,所有方法调用都会被逐个记录,从而更清晰地呈现出方法调用树中托管内存分配的位置。

需要注意的是,Deep Profile 模式不仅可用于编辑器,还可通过命令行参数 -deepprofiling 在 Android 和桌面平台启用。在分析期间,Deep Profiler 按钮会保持灰色状态。

 

 

Basic memory conservation 基础内存节省

There are a handful of relatively simple techniques that can be employed to reduce managed heap allocations.

有一些相对简单的技巧可以用来减少托管堆的内存分配

Collection and array reuse  集合与数组的复用

When using C#’s Collection classes or Arrays, consider reusing or pooling the allocated Collection or Array whenever possible. The Collection classes expose a Clear method which eliminates the Collection’s values but does not release the memory allocated to the Collection.

在使用 C# 的集合类(如 ListDictionary 等)或数组时,应尽可能重用或池化这些已分配的集合或数组
大多数集合类提供了 Clear() 方法,它可以清除集合中的元素,但不会释放集合本身所占用的内存

void Update() {

    List<float> nearestNeighbors = new List<float>();

    findDistancesToNearestNeighbors(nearestNeighbors);

    nearestNeighbors.Sort();

    // … use the sorted list somehow …

}

This is particularly useful when allocating temporary “helper” Collections for complex computations. A very simple example might be the following code:

In this example, the nearestNeighbors List is allocated once per frame in order to collect a set of data points. It’s very simple to hoist this List out of the method and into the containing class, which avoids allocating a new List each frame:

这在为复杂计算分配临时的“辅助”集合时尤其有用。一个非常简单的例子如下所示:

在这个示例中,nearestNeighbors 这个 List 每帧都会被重新分配,用于收集一组数据点。
其实非常容易将这个 List 从方法内部提取出来,放到包含它的类中作为成员变量,
这样就可以避免每帧创建一个新的 List

List<float> m_NearestNeighbors = new List<float>();

void Update() {

    m_NearestNeighbors.Clear();

    findDistancesToNearestNeighbors(NearestNeighbors);

    m_NearestNeighbors.Sort();

    // … use the sorted list somehow …

}

In this version, the List’s memory is retained and reused across multiple frames. New memory is only allocated when the List needs to expand.

在这个版本中,List 的内存会被保留并在多帧之间复用。只有当 List 需要扩容时,才会分配新的内存。

Closures and anonymous methods 闭包与匿名方法

There are two points to consider when using closures and anonymous methods.

First, all method references in C# are reference types, and are therefore allocated on the heap. Temporary allocations can be easily created by passing a method reference as an argument. This allocation occurs regardless of whether the method being passed is an anonymous method or a predefined one.

Second, converting an anonymous method to a closure significantly increases the amount of memory required to pass the closure to method receiving it.

在使用闭包(Closure)和匿名方法时,有两个重点需要考虑:

1️⃣ 首先,C# 中的所有方法引用(无论是匿名方法还是普通方法)都是引用类型,因此会被分配在托管堆上

你只需将一个方法引用作为参数传递,就会产生临时分配。这种分配无论传入的是匿名方法还是预定义方法,都会发生。

2️⃣ 其次,将匿名方法转换为闭包时,所需的内存分配会显著增加。

Consider the following code:

List<float> listOfNumbers = createListOfRandomNumbers();

listOfNumbers.Sort( (x, y) =>

(int)x.CompareTo((int)(y/2)) 

);

 

This snippet uses a simple anonymous method to control the sorting order of the list of numbers created on the first line. However, if a programmer wished to make this snippet reusable, it is tempting to substitute the constant 2 for a variable in local scope, like so:

这个片段使用了一个简单的匿名方法,控制列表中数字的排序顺序。但如果程序员想让这个排序逻辑具有更强的可复用性,可能会将 2 替换为作用域内的变量,例如:

List<float> listOfNumbers = createListOfRandomNumbers();

int desiredDivisor = getDesiredDivisor();

listOfNumbers.Sort( (x, y) =>

(int)x.CompareTo((int)(y/desiredDivisor))

);

 

The anonymous method now requires the method to be able to access the state of a variable outside of the method’s scope, and so has become a closure. The desiredDivisor variable must be passed into the closure somehow so that it can be used by the actual code of the closure.

现在这个匿名方法需要访问方法外部的变量 desiredDivisor,因此它已经变成了一个闭包(Closure)
这意味着,C# 必须设法让该匿名方法可以访问它作用域外的变量。

To do this, C# generates an anonymous class that can retain the externally-scoped variables needed by the closure. A copy of this class is instantiated when the closure is passed to the Sort method, and the copy is initialized with the value of the desiredDivisor integer.

为此,C# 会生成一个匿名类(compiler-generated class),用于持有闭包中引用的外部变量
当这个闭包被传入 Sort() 方法时,C# 会实例化一份该匿名类的拷贝,并使用当前 desiredDivisor 的值进行初始化。

由于这个匿名类是一个引用类型,它必须被分配到**托管堆(Managed Heap)**上。
也就是说,每次执行闭包,都会导致分配一个新的引用类型对象。

Because executing the closure requires instantiation of a copy of its generated class, and all classes are reference types in C#, then executing the closure requires allocation of an object on the managed heap.

In general, it is best to avoid closures in C# whenever possible. Anonymous methods and method references should be minimized in performance-sensitive code, and especially in code that executes on a per-frame basis.

总的来说,在 C# 中应尽量避免使用闭包
匿名方法和方法引用也应该在性能敏感的代码中最小化使用,特别是在每帧都会执行的逻辑中,如 Update()LateUpdate() 等。

 

Boxing

Boxing is one of the most common sources of unintended temporary memory allocations found in Unity projects. It occurs whenever a value-typed value is utilized as a reference type; this most often occurs when passing primitive value-typed variables (such as int and float) to object-typed methods.

装箱是 Unity 项目中最常见的非预期临时内存分配来源之一。
装箱发生在将值类型(value type)作为引用类型(reference type)使用时,最常见的情况是将诸如 intfloat 这样的原始值类型变量传入接受 object 类型参数的方法时。

In this extremely simple example, the integer in x is boxed in order to be passed to the object.Equals method, because the Equals method on object requires that an object be passed to it.

在这个极简示例中,x 是一个 int 值类型,为了能传给 object.Equals(需要引用类型参数),编译器会x 装箱为 object 类型,从而产生一次堆分配。

int x = 1;

object y = new object();

y.Equals(x);

 

C# IDEs and compilers generally do not issue warnings about boxing, even though it leads to unintended memory allocations. This is because the C# language was developed with the assumption that small temporary allocations would be efficiently handled by generational garbage collectors and allocation-size-sensitive memory pools.

While Unity’s allocator does use different memory pools for small and large allocations, Unity’s garbage collector is not generational and therefore cannot efficiently sweep out the small, frequent temporary allocations generated by boxing.

Boxing should be avoided wherever possible when writing C# code for Unity runtimes.

C# 的 IDE 和编译器通常不会对装箱操作发出警告,即使装箱会导致非预期的内存分配。
这是因为 C# 语言在设计时假设:小型的临时内存分配可以被分代垃圾回收器(generational GC)和基于分配大小的内存池高效处理

虽然 Unity 的分配器确实针对小型与大型内存分配使用了不同的内存池,但 Unity 的垃圾回收器不是分代式的,因此无法高效清理由装箱产生的小而频繁的临时分配

因此,在为 Unity 运行时编写 C# 代码时,应尽可能避免装箱操作

 

Identifying boxing  如何识别装箱操作

Boxing shows up in CPU traces as calls to one of a few methods, depending on the scripting backend in use. These generally take one of the following forms, where <some class> is the name of some other class or struct, and  is some number of arguments:

在 CPU Profiler 的追踪记录中,装箱操作通常表现为对某些特定方法的调用,具体取决于你所使用的脚本后端(例如 Mono 或 IL2CPP)。

这些方法调用通常具有如下形式,其中 <some class> 表示某个类或结构体的名称,... 表示任意数量的参数:

  • <some class>::Box(…)

  • Box(…)

  • <some class>_Box(…)

It can also be located by searching the output of a decompiler or IL viewer, such as the IL viewer tool built into ReSharper or the dotPeek decompiler. The IL instruction is “box”.

你也可以通过反编译器或 IL 查看器的输出结果来定位装箱操作,
例如使用 ReSharper 内置的 IL Viewer 工具,或 JetBrains 的 dotPeek 反编译器。

在 IL(中间语言)代码中,装箱操作对应的 IL 指令是:box

 

Dictionaries and enums

One common cause of boxing is the use of enum types as keys for Dictionaries. Declaring an enum creates a new value type that is treated like an integer behind the scenes, but enforces type-safety rules at compile time.

By default, a call to Dictionary.add(key, value) results in a call to Object.getHashCode(Object). This method is used to obtain the appropriate hash code for the Dictionary’s key, and is used in all methods that accept a key: Dictionary.tryGetValueDictionary.remove, etc.

The Object.getHashCode method is reference-typed, but enum values are always value types. Therefore, for enum-keyed Dictionaries, every method call results in the key being boxed at least once.

The following code snippet illustrates a simple example that demonstrates this boxing problem:

在 Unity 中,使用枚举类型(enum)作为 Dictionary 的键是导致装箱(Boxing)的常见原因之一。

声明一个枚举时,虽然它在底层会被当作整数(如 int)处理,但在编译时仍具有类型安全检查。

默认情况下,调用 Dictionary.Add(key, value) 方法会触发对 Object.GetHashCode(Object) 的调用。
这个方法用于获取键的哈希值,并且所有使用键的方法都会使用它,例如:

  • Dictionary.TryGetValue(...)

  • Dictionary.Remove(...)

  • ContainsKey(...)

然而,Object.GetHashCode 是一个引用类型的方法,而枚举是值类型
因此,当枚举被作为键使用时,每次调用相关方法时,都会将该枚举值进行装箱

下面的代码片段展示了一个简单的例子,演示了这个装箱问题:

enum MyEnum { a, b, c };

var myDictionary = new Dictionary<MyEnum, object>();

myDictionary.Add(MyEnum.a, new object());

To solve this problem, it is necessary to write a custom class that implements the IEqualityComparer interface and assign an instance of that class as the Dictionary’s comparer (Note: This object is usually stateless, and therefore can be reused with different Dictionary instances to save memory).

The following is a simple example of an IEqualityComparer for the above code snippet.

为了解决这个问题,需要编写一个自定义类来实现 IEqualityComparer 接口
并在创建 Dictionary 时,将该类的实例作为比较器(Comparer)传入。

(注:这个比较器对象通常是无状态的,因此可以被多个 Dictionary 实例重复使用,以节省内存。)

下面是一个针对上述代码片段的简单 IEqualityComparer 实现示例。

 

 1 public class MyEnumComparer : IEqualityComparer<MyEnum> {
 2 
 3     public bool Equals(MyEnum x, MyEnum y) {
 4 
 5         return x == y;
 6 
 7     }
 8 
 9     public int GetHashCode(MyEnum x) {
10 
11         return (int)x;
12 
13     }
14 
15 }

 

An instance of the above class could be passed to the Dictionary’s constructor.

可以将上面这个类的实例作为参数传递给 Dictionary 的构造函数

 

Foreach loops

In Unity’s version of the Mono C# compiler, use of the foreach loop forces Unity to box a value each time the loop terminates (Note: The value is boxed once each time the loop as a whole finishes executing. It does not box once per iteration of the loop, so memory usage remains the same regardless of whether the loop runs two times or 200 times). This is because the IL generated by Unity’s C# compiler constructs a generic value-type Enumerator in order to iterate over the value collection.

This Enumerator implements the IDisposable interface, which must be called when the loop terminates. However, calling interface methods on value-typed objects (such as structs and Enumerators) requires boxing them.

Examine the following very simple example code:

在 Unity 所使用的 Mono C# 编译器中,使用 foreach 循环会导致 Unity 在每次循环结束时对一个值进行装箱(Boxing)
(注意:这个值是在整个循环执行完之后被装箱的,不是每次迭代都装箱,所以不管循环执行了 2 次还是 200 次,内存使用量是一样的。)

这是因为 Unity 的 C# 编译器所生成的 IL 代码,会构造一个 泛型值类型的枚举器(Enumerator) 来遍历集合。

这个枚举器实现了 IDisposable 接口。根据 C# 的语言规范,在 foreach 结束时必须调用 Dispose() 方法。
但如果该对象是值类型(例如结构体或 Enumerator),那么调用其接口方法时,需要将其装箱成引用类型对象

下面是一个非常简单的示例代码,用于演示这个装箱问题:

int accum = 0;

foreach(int x in myList) {

    accum += x;

}

The above, when run through Unity’s C# compiler, produces the following Intermediate Language:

 1 .method private hidebysig instance void 
 2 
 3     ILForeach() cil managed 
 4 
 5   {
 6 
 7     .maxstack 8
 8 
 9     .locals init (
10 
11       [0] int32 num,
12 
13       [1] int32 current,
14 
15       [2] valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32> V_2
16 
17     )
18 
19     // [67 5 - 67 16]
20 
21     IL_0000: ldc.i4.0     
22 
23     IL_0001: stloc.0      // num
24 
25     // [68 5 - 68 74]
26 
27     IL_0002: ldarg.0      // this
28 
29     IL_0003: ldfld        class [mscorlib]System.Collections.Generic.List`1<int32> test::myList
30 
31     IL_0008: callvirt     instance valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<!0/*int32*/> class [mscorlib]System.Collections.Generic.List`1<int32>::GetEnumerator()
32 
33     IL_000d: stloc.2      // V_2
34 
35     .try
36 
37     {
38 
39       IL_000e: br           IL_001f
40 
41     // [72 9 - 72 41]
42 
43       IL_0013: ldloca.s     V_2
44 
45       IL_0015: call         instance !0/*int32*/ valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::get_Current()
46 
47       IL_001a: stloc.1      // current
48 
49     // [73 9 - 73 23]
50 
51       IL_001b: ldloc.0      // num
52 
53       IL_001c: ldloc.1      // current
54 
55       IL_001d: add          
56 
57       IL_001e: stloc.0      // num
58 
59     // [70 7 - 70 36]
60 
61       IL_001f: ldloca.s     V_2
62 
63       IL_0021: call         instance bool valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>::MoveNext()
64 
65       IL_0026: brtrue       IL_0013
66 
67       IL_002b: leave        IL_003c
68 
69     } // end of .try
70 
71     finally
72 
73     {
74 
75       IL_0030: ldloc.2      // V_2
76 
77       IL_0031: box          valuetype [mscorlib]System.Collections.Generic.List`1/Enumerator<int32>
78 
79       IL_0036: callvirt     instance void [mscorlib]System.IDisposable::Dispose()
80 
81       IL_003b: endfinally   
82 
83     } // end of finally
84 
85     IL_003c: ret          
86 
87   } // end of method test::ILForeach
88 
89 } // end of class test

The most relevant code is the __finally { … }__ block near the bottom. The callvirt instruction discovers the location of the IDisposable.Dispose method in memory before invoking the method, and requires that the Enumerator be boxed.

In general, foreach loops should be avoided in Unity. Not only do they box, but the method-call cost of iterating over collections via Enumerators is generally much slower than manual iteration via a for or while loop.

Note that the C# compiler upgrade in Unity 5.5 significantly improves Unity’s ability to generate IL. In particular, the boxing operations has been eliminated from foreach loops. This eliminates the memory overhead associated with foreach loops. However, the CPU performance difference compared to equivalent Array-based code remains, due to method-call overhead.

最关键的代码位于底部附近的 __finally { … } 代码块中。

其中的 callvirt 指令会在调用 IDisposable.Dispose 方法之前,查找该方法在内存中的位置,而这一步要求对枚举器(Enumerator)进行装箱

在 Unity 中应尽量避免使用 foreach 循环
因为它不仅会导致装箱,还因为通过枚举器(Enumerator)来遍历集合的方法调用成本,通常比使用 forwhile 循环进行手动迭代要慢得多

需要注意的是,从 Unity 5.5 起,C# 编译器升级显著提升了 Unity 生成 IL 的能力。
尤其是在 foreach 循环中,装箱操作被彻底移除了,不再有因 foreach 导致的内存分配开销

但即便如此,与基于数组的 for 循环相比,foreach 依然存在 方法调用的 CPU 性能开销,这部分差距仍然存在。

 

Array-valued Unity APIs   数组的 Unity API

A more pernicious and less-visible cause of spurious array allocation is the repeated accessing of Unity APIs that return arrays. All Unity APIs that return arrays create a new copy of the array each time they are accessed. It is extremely non-optimal to access an array-valued Unity API more often than necessary.

As an example, the following code spuriously creates four copies of the vertices array per loop iteration. The allocations are occur each time the .vertices property is accessed.

一个更隐蔽但危害更大的内存分配来源,是频繁访问返回数组的 Unity API

所有返回数组的 Unity API,每次访问时都会返回一个新复制的数组副本
因此,不必要地频繁访问这类 API 会导致大量额外分配,非常低效。

举个例子:

下面这段代码,在每次循环中会无意中创建 4 个 vertices 数组的副本
这是因为每次访问 .vertices 属性时,Unity 都会重新复制整个顶点数组。

 1 for(int i = 0; i < mesh.vertices.Length; i++)
 2 
 3 {
 4 
 5     float x, y, z;
 6 
 7     x = mesh.vertices[i].x;
 8 
 9     y = mesh.vertices[i].y;
10 
11     z = mesh.vertices[i].z;
12 
13     // ...
14 
15     DoSomething(x, y, z);   
16 
17 }

This can be trivially refactored into a single array allocation, regardless of the number of loop iterations, by capturing the vertices array before entering the loop:

这段代码可以非常轻松地重构为仅分配一次数组,无论循环执行多少次。
只需在进入循环之前先缓存一次 vertices 数组即可。

 1 var vertices = mesh.vertices;
 2 
 3 for(int i = 0; i < vertices.Length; i++)
 4 
 5 {
 6 
 7     float x, y, z;
 8 
 9     x = vertices[i].x;
10 
11     y = vertices[i].y;
12 
13     z = vertices[i].z;
14 
15     // ...
16 
17     DoSomething(x, y, z);   
18 
19 }

While the CPU cost of accessing a property once is not very high, repeated accesses within tight loops create CPU performance hotspots. Further, repeated accesses unnecessarily expand the managed heap.

This problem is extremely common on mobile, because the Input.touches API behaves similarly to the above. It is extremely common for projects to contain code similar to the following, where an allocation occurs each time the .touches property is accessed.

虽然访问属性一次的 CPU 开销并不高,但如果在紧凑的循环中频繁访问,会造成明显的 CPU 性能热点。
此外,重复访问还会导致托管堆不必要地膨胀

这个问题在移动端尤为常见,因为 Input.touches API 的行为与上述类似。
很多项目中经常会出现如下的代码模式:每次访问 .touches 属性时都会发生一次数组分配,从而带来性能开销。

1 for ( int i = 0; i < Input.touches.Length; i++ )
2 
3 {
4 
5    Touch touch = Input.touches[i];
6 
7     //
8 
9 }

This can, of course, be trivially improved by hoisting the array allocation out of the loop condition:

Touch[] touches = Input.touches;

for ( int i = 0; i < touches.Length; i++ )

{

   Touch touch = touches[i];

   //

}

However, there are now versions of many Unity APIs that do not cause memory allocations. These should generally be favored, when they’re available.

不过,现在 Unity 中的许多 API 已经提供了不产生内存分配的版本。
在这些版本可用的情况下,应该优先使用它们

int touchCount = Input.touchCount;

for ( int i = 0; i < touchCount; i++ )

{

   Touch touch = Input.GetTouch(i);

   //

}

Converting the above example to the allocation-less Touch API is simple:

Note that the property access (Input.touchCount) is still kept outside the loop condition in order to save the CPU cost of invoking the property’s get metho

将上面的示例转换为不产生分配的 Touch API 版本是非常简单的。

需要注意的是:Input.touchCount 这个属性访问仍然被放在循环条件之外
这是为了节省每次调用该属性 getter 方法所产生的 CPU 开销

Empty array reuse 复用空数组

Some teams prefer to return empty arrays instead of null when an array-valued method needs to return an empty set. This coding pattern is common in many managed languages, particularly C# and Java.

In general, when returning a zero-length array from a method, it is considerably more efficient to return a pre-allocated singleton instance of the zero-length array than to repeatedly create empty arrays(5) (Note: Naturally, an exception should be made when the array is resized after being returned).

Footnotes

  • (1) This is because, on most platforms, readback from GPU memory is extremely slow. Reading a Texture from GPU memory into a temporary buffer for use by CPU code (e.g. Texture.GetPixel) would be very nonperformant.

  • (2) Strictly speaking, all non-null reference-typed objects and all boxed value-typed objects must be allocated on the managed heap.

  • (3) The exact timing is platform-dependent.

  • (4) Note that this is not identical to the number of bytes temporarily allocated during a given frame. The profile displays the number of bytes allocated in a specific frame, even if some/all of the allocated memory is reused in subsequent frames.

  • (5) Naturally, an exception should be made when the array is resized after being returned.

有些团队在方法需要返回一个空集合时,倾向于返回空数组而不是 null
这种编码模式在许多托管语言中很常见,尤其是 C# 和 Java

一般来说,当一个方法需要返回一个长度为 0 的数组时
相比每次都创建一个新的空数组,返回一个预先分配好的“单例空数组”实例会高效得多(注:当返回后的数组需要被重新调整大小时,这种做法自然不适用)。

 

private static readonly MyType[] EmptyArray = new MyType[0];

public MyType[] GetResults()
{
    if (hasNoData)
        return EmptyArray; // ✅ 高效复用
    else
        return ComputeResults();
}

脚注说明:

    1. (关于 Texture.GetPixel)在大多数平台上,从 GPU 内存读取数据非常慢。将纹理从 GPU 内存读取到 CPU 使用的临时缓冲区,会极大影响性能。

    2. 严格来说,所有非 null 的引用类型对象,以及所有被装箱的值类型对象,都必须分配在托管堆上。

    3. 垃圾回收器(GC)的具体触发时机因平台而异。

    4. Profiler 中显示的 GC Alloc,并不等同于某一帧中临时分配的总字节数。
      它显示的是该帧中发生的新分配总量,即使这些内存在后续帧中会被重用。

    5. 如果返回后的数组要被重新调整大小,那自然不能使用共享空数组实例。

 

posted @ 2025-04-16 23:28  sun_dust_shadow  阅读(32)  评论(0)    收藏  举报