完整的顶点压缩

转载请注明出处为KlayGE游戏引擎，本文的永久链接为http://www.klayge.org/?p=2116

在压缩tangent frame一文中，我们看到了把tangent frame压缩到4个字节的方法。现在让我们看看如何压缩其他属性，以达到减小顶点数据的目的。

顶点属性

首先看看完整的顶点都包含了哪些属性：

属性	类型	大小（字节）	备注
position	float3	12
texcoord	float2	8
tangent	float3	12
binormal	float3	12
normal	float3	12
blend_index	uint4	16	骨骼动画模型才有
blend_weight	float4	16	骨骼动画模型才有
总共		88

经过tangent frame压缩，同时一般引擎都会把blend_index和blend_weight存入uint32，顶点格式就成了：

属性	类型	大小（字节）	备注
position	float3	12
texcoord	float2	8
tangent_quat	uint32	4
blend_index	uint32	4	骨骼动画模型才有
blend_weight	uint32	4	骨骼动画模型才有
总共		32

这也是一般引擎常见的顶点布局。

目标

上面可以看到除了position和texcoord，其他属性都已经没什么油水了。所以我们的目标是把postion和texcoord压缩到每个分量16-bit的程度，也就是：

属性	类型	大小（字节）	备注
position	short4	8
texcoord	short2	4
tangent_quat	uint32	4
blend_index	uint32	4	骨骼动画模型才有
blend_weight	uint32	4	骨骼动画模型才有
总共		24

好在，现在的显卡都支持short4和short2的顶点属性格式，给这份目标提供了可能。可惜不支持short1，否则position还能拆成short2+short1，能多省一个short。

方法

顶点属性中存放的short在GPU端读到的是[-1, 1]范围的值，很显然，直接把position和texcoord的分量直接放到short是不行的。这里必须在模型里保存position和texcoord的bounding box，而在顶点属性里存的是经过bounding box归一化过的分量。在VS里面，用两者重建出float的分量，继续后面的计算。另一个好在，现代显卡都支持MAD，所以position * bounding_box_extent + bounding_box_center只需要一条指令，计算造成的性能几乎无损，传递属性却能省掉几乎一般的带宽。

就是这样简单直接的压缩，让顶点大小比仅压缩tangent frame的情况下又减少了1/4。精度上，short能提供65536个不同的值，对于一般情况绰绰有余了。

总结

	大小（字节）	压缩率	备注
原始格式	88	100%
KlayGE 4.1及以前	48	55%	tangent frame压缩到8字节，blend_index压缩到4字节
KlayGE 4.2	24	28%	tangent frame压缩到4字节，blend_weight压缩到4字节，position压缩到8字节，texcoord压缩到4字节

经过这样的压缩，新格式只占用原始格式的28%。在CausticsMap和DeferredRendering等例子中性能有10%的提升。

posted on 2012-11-13 13:27 龚敏敏阅读(1918) 评论(2) 编辑收藏举报