Zenith.NET 开发札记：把 .NET 图形 API 推向现代 RHI

Zenith.NET 最近做了一轮比较大的 RHI 重构。它不是一次普通的 API 改名，也不是单纯整理代码，而是把整个图形抽象层从早期“更容易上手的封装”，往更现代、更贴近 DirectX 12 / Vulkan / Metal 的底层模型推进。

这轮重构的重点，是新版引入了哪些能力、为什么要做 bindless 和显式屏障、两套 API 用法有什么差别，以及这些变化未来会给性能和生态带来哪些空间。

新版最重要的变化

Zenith.NET 正在从“跨平台图形封装库”变成“面向现代 GPU 的 .NET RHI”。

新版的重点不只是创建 buffer、texture、pipeline，而是开始对齐现代图形 API 中更底层也更关键的能力：bindless 资源访问、显式屏障、布局转换、多队列、shader-visible descriptor heap、Metal 4、Vulkan 1.4 以及 Slang 跨后端编译。

核心方向包括：

DirectX 12 后端要求 Shader Model 6.6、Resource Binding Tier 3 和 Enhanced Barriers。
DirectX 12 使用大号 shader-visible descriptor heap，让 shader 能直接索引资源。
Vulkan 路线开始对齐 Vulkan 1.4 和 VK_EXT_descriptor_heap 这类 descriptor heap / bindless 资源模型，Slang 编译目标也已经启用 spvDescriptorHeapEXT 能力。
Metal 后端开始转向 Metal 4，使用 Metal 4 compiler、MTL4 command queue、residency set，并为 argument buffer / bindless 风格资源访问做准备。
核心层新增了 TextureLayout、BarrierStages、ColorAttachment、DepthStencilAttachment、ResourceHandle 等更贴近现代 RHI 的概念。
Shader 统一通过 Slang 编译到 DXIL、metallib 和 SPIR-V，未来可以更自然地维护一套 shader 源码。

这些东西听起来比“画一个三角形”硬很多，但它们决定了库的上限。Zenith.NET 如果只是做一个简单渲染封装，旧 API 也能继续用；但如果要承载 ray tracing、mesh shading、GPU-driven rendering、后处理链、ImGui/Skia/游戏引擎集成，那底层模型就必须先足够现代。

现代图形 API 的复杂度不是凭空来的。它把以前驱动和封装层偷偷替应用做的事情，重新交还给应用层。代价是 API 更显式，收益是使用者终于能控制资源什么时候可见、什么时候转换、什么时候同步，以及 shader 到底如何拿到它需要的数据。

资源绑定：旧版写法

旧版本的资源绑定模型比较传统：C# 侧先声明 ResourceBinding[]，再创建 ResourceTable，把 texture、sampler、buffer 写入 table。渲染时把 table push 到命令缓冲，shader 侧按声明顺序访问资源。

旧版 C# 写法大致是这样：

ResourceBinding[] bindings =
[
    new() { Type = ResourceType.ConstantBuffer, Count = 1 },
    new() { Type = ResourceType.Texture, Count = 1 },
    new() { Type = ResourceType.Sampler, Count = 1 }
];

ResourceTable table = context.CreateResourceTable(new() { Bindings = bindings });

table.Write(0, constantBuffer);
table.Write(1, albedoTexture);
table.Write(2, linearSampler);

pipeline = context.CreateGraphicsPipeline(new()
{
    Vertex = vertexShader,
    Pixel = pixelShader,
    ResourceBindings = bindings,
    Output = frameBuffer.Output
});

commandBuffer.BeginRenderPass(frameBuffer, clearValue, table);

commandBuffer.SetPipeline(pipeline);
commandBuffer.PushResourceTable(table);

commandBuffer.DrawIndexed(indexCount, 1, 0, 0, 0);

commandBuffer.EndRenderPass();

对应的旧版 shader 通常是按绑定顺序声明资源：

ConstantBuffer<Constants> constants;
Texture2D albedo;
SamplerState linearSampler;

float4 PSMain(PSInput input) : SV_Target
{
    return albedo.Sample(linearSampler, input.TexCoord) * constants.Tint;
}

这套模型的优点是直观，尤其适合教程：C# 声明绑定布局，shader 按顺序使用资源。缺点也很明显：资源越多，table 和 layout 管理越复杂；draw call 越多，绑定切换越容易变成 CPU 侧负担；进入 ray tracing、GPU-driven、材质数组、纹理数组这些场景后，传统资源表会越来越不舒服。

少量资源时这很清楚；资源一多，CPU 就会频繁参与“摆桌面”。传统 descriptor set、layout 和 pipeline layout 的层次，本质上是在把 shader 能访问的资源分组、定型，再在命令流里绑定到对应位置。

资源绑定：新版写法

新版改成了更接近 bindless 的思路：资源创建后拿到一个 ResourceHandle，shader 需要什么资源，就把 handle 放进常量 buffer 或结构化数据里。资源绑定不再围绕“这一帧 push 哪张表”展开，而是变成 shader 数据的一部分。

新版 C# 写法更像这样：

MaterialConstants constants = new()
{
    Transform = transform,
    BaseColor = new(1.0f, 1.0f, 1.0f, 1.0f),
    Albedo = albedoTexture.SampledHandle,
    Sampler = linearSampler.Handle
};

constantBuffer.Upload(0, new()
{
    Pointer = (nint)(&constants),
    SizeInBytes = (uint)sizeof(MaterialConstants)
});

commandBuffer.Transition(color, default, TextureLayout.ColorAttachment);

commandBuffer.BeginRenderPass([ColorAttachment.Clear(color, clearColor)], null);

commandBuffer.SetPipeline(pipeline);
commandBuffer.SetConstantBuffer(constantBuffer, 0);

commandBuffer.DrawIndexed(indexCount, 1, 0, 0, 0);

commandBuffer.EndRenderPass();

commandBuffer.Transition(color, default, TextureLayout.Sampled);

对应的新版 shader 不再依赖固定的 ResourceTable 顺序，而是通过 handle 访问资源。概念上可以写成这样：

struct MaterialConstants
{
    float4x4 Transform;
    float4 BaseColor;

    ResourceHandle Albedo;
    ResourceHandle Sampler;
};

ConstantBuffer<MaterialConstants> constants;

float4 PSMain(PSInput input) : SV_Target
{
    Texture2D albedo = ResourceDescriptorHeap[constants.Albedo];
    SamplerState samplerState = SamplerDescriptorHeap[constants.Sampler];

    return albedo.Sample(samplerState, input.TexCoord) * constants.BaseColor;
}

实际 shader 语法会根据后端和 Slang 输出目标做适配，但思路是一致的：C# 侧传 handle，shader 侧按 handle 找资源。DirectX 12 对应 shader-visible descriptor heap 和直接索引；Vulkan 对齐 descriptor heap / descriptor indexing 思路；Metal 侧则向 Metal 4 的 argument buffer / bindless 资源访问靠拢。

这个变化带来的直接收益是：资源绑定可以更加数据驱动。一个材质、一批实例、一个光追场景，都可以把资源 handle 作为普通数据传给 GPU。后续做材质系统、纹理数组、GPU culling、indirect drawing、ray tracing 时，这种模型会比反复切换 resource table 更自然。

这个模型对材质系统尤其友好。以前一个材质可能意味着一套绑定表；现在一个材质更像一段普通数据，里面记录需要哪些纹理、哪个 sampler、哪些 buffer。当材质数量、实例数量、光源数量上来以后，这个差异会非常明显。

如果说旧模型强调“按 set 和 binding 把资源分批摆好”，bindless 则更强调“资源先进入一个大的可索引空间，shader 用数据里的索引去取”。DirectX 12 的 ResourceDescriptorHeap / SamplerDescriptorHeap、Vulkan 的 descriptor indexing / descriptor heap 方向，以及 Metal 的 argument buffer，本质上都在把资源绑定从命令状态变成 shader 可消费的数据。

屏障和布局转换显式化

另一个重要变化是资源状态。

旧版本里，很多状态转换被封装在更高层的调用里，比如 BeginRenderPass(frameBuffer, clearValue, resourceTable)。这对入门很友好，但当项目开始支持更多后端和更多高级功能时，隐藏状态反而会带来麻烦：你很难知道某张 texture 此刻到底是 render target、shader resource、storage image 还是 present image。

新版把这些状态放回命令流：

commandBuffer.Transition(colorTexture, default, TextureLayout.ColorAttachment);
commandBuffer.Transition(depthTexture, default, TextureLayout.DepthStencilAttachment);

ColorAttachment colorAttachment = ColorAttachment.Clear(colorTexture, clearColor);
DepthStencilAttachment depthAttachment = DepthStencilAttachment.Clear(depthTexture, 1.0f, 0);

commandBuffer.BeginRenderPass([colorAttachment], depthAttachment);

commandBuffer.SetPipeline(pipeline);

commandBuffer.DrawIndexed(indexCount, 1, 0, 0, 0);

commandBuffer.EndRenderPass();

commandBuffer.Transition(colorTexture, default, TextureLayout.Sampled);

这看起来多写了几行，但它带来的收益非常实际：

DirectX 12 可以映射到 Enhanced Barriers。
Vulkan 可以映射到 image layout 和 pipeline barrier。
Metal 可以通过 usage、hazard tracking、residency 等机制更清楚地表达资源生命周期。
上层可以更容易做 render graph、pass 合并、异步 compute 和资源别名。

更重要的是，性能优化终于有抓手了。以前“库帮你转状态”虽然省事，但很容易保守，甚至发生不必要的 barrier。现在状态转换出现在命令流里，后续就可以做 barrier 合并、冗余 transition 消除、跨 pass 调度等优化。

这也是 barrier 在现代 RHI 里绕不开的原因。GPU 并不是“上一行代码执行完，下一行代码自然安全”。渲染、计算、拷贝、采样这些阶段之间有缓存、队列、访问类型和布局差异。显式 barrier 的意义，就是告诉后端：前一个阶段写入了什么，后一个阶段要如何读取，哪些数据必须在这里变得可见。

如果屏障范围过宽，就可能让 GPU 在大量无关阶段之间硬等，形成明显的空泡。新版把 TextureLayout 和阶段信息显式交给命令流，就是为了后续能把这些等待收窄、合并或消掉。

API 风格的变化

旧 API 更像一个高层封装：创建 frame buffer，创建 resource table，render pass 开始时把它们一起交给命令缓冲。

commandBuffer.BeginRenderPass(frameBuffer, clearValue, resourceTable);

commandBuffer.SetPipeline(pipeline);
commandBuffer.PushResourceTable(resourceTable);

commandBuffer.SetVertexBuffer(vertexBuffer, 0, 0);
commandBuffer.SetIndexBuffer(indexBuffer, 0, IndexFormat.UInt32);

commandBuffer.DrawIndexed(6, 1, 0, 0, 0);

commandBuffer.EndRenderPass();

新版更接近真实 GPU 命令：先说明资源接下来怎么用，再说明 render pass 的 attachment，再设置 pipeline 和输入资源。

commandBuffer.Transition(color, default, TextureLayout.ColorAttachment);
commandBuffer.Transition(depth, default, TextureLayout.DepthStencilAttachment);

ColorAttachment colorAttachment = ColorAttachment.Clear(color, clearColor);
DepthStencilAttachment depthAttachment = DepthStencilAttachment.Clear(depth, 1.0f, 0);

commandBuffer.BeginRenderPass([colorAttachment], depthAttachment);

commandBuffer.SetPipeline(pipeline);

commandBuffer.SetVertexBuffer(vertexBuffer, 0, 0);
commandBuffer.SetIndexBuffer(indexBuffer, 0, IndexFormat.UInt32);
commandBuffer.SetConstantBuffer(constants, 0);

commandBuffer.DrawIndexed(indexCount, 1, 0, 0, 0);

commandBuffer.EndRenderPass();

新版并不是“更简单”的 API。它确实更底层，也更要求使用者理解现代 GPU 的几个概念。但它更明确，也更适合做严肃一点的图形、计算和引擎层开发。

对普通用户来说，后续可以通过 helper 和更高层扩展把常见路径再包起来；但核心 RHI 不能再建立在太高层的假设上。

新版能带来哪些性能收益

从架构上看，新版的性能收益主要来自几个方向。

第一是更少的 CPU 绑定开销。Bindless / descriptor heap 模型会减少频繁更新 resource table、切换 descriptor set/table 的需求。资源变成 handle 后，很多场景只需要更新一小段常量或实例数据。

第二是更少的后端胶水。旧模型里，为了统一 resource table、frame buffer、resource layout，每个后端都要维护一套配套对象。新版删掉了不少这种中间层，核心路径更短，后端可以更直接地使用原生 API。

第三是屏障优化空间更大。显式 TextureLayout 和 BarrierStages 让资源状态变得可分析，后续可以做冗余 barrier 消除、pass 间 barrier 合并，甚至为 render graph 做准备。

第四是更适合 GPU-driven。Indirect draw、mesh shading、ray tracing、compute culling 这类工作流，本质上都希望 GPU 读资源索引、读参数、自己驱动更多工作。Bindless 和显式资源状态是这些能力的基础。

第五是内存模型更清楚。新版把资源用途和驻留位置拆开，例如 BufferUsages 与 MemoryResidency。这能让上传、下载、GPU-only、CPU-write 这几种路径更容易走到合适的内存策略。

所以这次重构不是“眼前某个 demo 帧率立刻翻倍”的类型，而是把之后真正影响性能的路径打开：少绑资源、少切状态、少做无用 barrier、更多工作留给 GPU。

新版不是把复杂度抹掉，而是把复杂度放在更适合被分析和优化的位置。只要资源状态和同步边界足够准确，图形、计算、拷贝这些工作就更有机会重叠起来，而不是被保守的全局等待串成一条线。

对第三方库和生态的意义

Zenith.NET 不是想做一个完整游戏引擎。它更适合成为 .NET 图形生态里的一层底座，所以第三方库兼容很重要。

目前仓库里已经有或正在维护的方向包括：

ImGui：用于工具界面、调试面板、编辑器 UI。
ImageSharp：用于图片加载、像素格式转换、纹理上传。
SkiaSharp：适合 2D 绘制、字体、矢量图、UI 合成等场景。
Slang：作为跨后端 shader 编译链，统一输出 DXIL、metallib 和 SPIR-V。
Avalonia、MAUI、WinForms、WinUI、WPF、Uno：面向 .NET UI 框架的视图集成。

新版 RHI 对这些集成的意义在于：底层资源模型统一后，上层库不必关心当前是 D3D12、Metal 还是 Vulkan。比如 ImGui 只需要拿到一张 texture 的绑定句柄；ImageSharp 只负责把图片数据上传成 texture；SkiaSharp 后续可以作为 2D 内容生产者，把结果交给 Zenith.NET 合成到 3D 或 UI 管线里。

如果后面 native object 暴露继续完善，也可以进一步和其他生态对接，例如窗口系统、视频解码、截图/录制、外部纹理共享、引擎插件等。

小结

这次重构真正想解决的不是“API 漂不漂亮”，而是 Zenith.NET 的上限。

旧版本更容易讲，也更容易写教程；新版更接近现代 RHI，更适合 bindless、ray tracing、mesh shading、GPU-driven rendering 和跨后端 shader 管线。短期看，它会让 API 更底层；长期看，它能减少后端维护成本，也能给性能优化和第三方生态留下更大的空间。

目前 DX12 路径已经可用，感兴趣的话可以切换到新版分支，尝试运行 Cornell Box 示例，直观看看这套新 RHI 的实际效果。

posted @ 2026-06-13 21:38 o王先生o 阅读(705) 评论(3) 收藏举报

刷新页面返回顶部

o王先生o