Unity3D在手机上的性能优化
在手机和Desktop的显卡的差别
手机的GPU需要的电量要比Destop小100X
体积也要比台式机的小100X
自然性能要慢
解决工具
1 分辨率
2 后期处理
3 MSAA(多重采样抗锯齿(MultiSampling Anti-Aliasing,简称MSAA)是一种特殊的超级采样抗锯齿(SSAA)。MSAA首先来自于OpenGL。具体是MSAA只对Z缓存(Z-Buffer)和模板缓存(Stencil Buffer)中的数据进行超级采样抗锯齿的处理。可以简单理解为只对多边形的边缘进行抗锯齿处理。这样的话,相比SSAA对画面中所有数据进行处理,MSAA对资源的消耗需求大幅减少,不过在画质上可能稍有不如SSAA。)
4 不同设备之间的不同处理
5 着色器
6 特效/粒子的密度
https://docs.unity3d.com/Documentation/Manual/MobileOptimisation.html
GPU的架构
Deferred Shading
在计算机图形领域,延期着色是一种在三维空间中着色的技术。这种着色的算法把目标分成很多的小块写入中间缓冲储存区,而后再合并。这种方法区别于直接把着色结果写进颜色的帧数缓冲里。在目前的硬件中,倾向于使用多重的渲染目标去避免重复转换矢量点。一旦所有需要的缓冲建好,就直接被读进一种着色算法中,合并在一起从而得出最后的结果。 这样,着色一个场景所需的计算和内存的带宽被减少到了这些可见的部分中,从而降低了着色深度的复杂性。
Tiled-based Deferred Shading
在进入正题之前,我们先回顾一下Intel在SIGGRAPH Courses 2010里提到的Tiled-based Deferred Shading。它的算法框架是: 1 生成G-Buffer,这一步和传统deferred shading一样。 2 把G-Buffer划分成许多16×16的tile,每个tile根据depth得到bounding box。 3 对于每个tile,把它的bounding box和light求交,得到对这个tile有贡献的light序列。 4 对于G-Buffer的每个pixel,用它所在tile的light序列累加计算shading。
Immediate
mmediate mode rendering is a style for application programming interfaces of graphics libraries, in which client calls directly cause rendering of graphics objects to the display. One example of "immediate mode" is using glBegin and glEnd with glVertex in between them. Another example of "immediate mode" is to use glDrawArrays with a client vertex array (i.e. not a vertex buffer object).
种类
•ImgTec PowerVR SGX (Tiled Deferred)
•NVIDIA Tegra(Classic (immediate))
•Qualcomm Adreno (Tiled)
•ARM Mali(Tiled)
例子:效率对比
0.8-3.5ms to reject 1280x600 fragments on Tegra2
can reject 2x2 pixels per cycle real scene: rejecting skybox in SHADOWGUN 0.9ms
0.05-0.13ms to reject 1024x768 fragments on iPad2
can reject 32 pixels per cycle 0.05ms coplanar geometry 0.11-0.13ms rejecting arbitrary geometry
Insight: it makes sense to spend lots of CPU cycles on Tegra to do occlusion culling and better sorting (to take advantage of Early Z-cull)
0.8-3.5ms to reject 1280x600 fragments on Tegra2 GPU 0.05-0.13ms to reject 1024x768 fragments on iPad2 GPU
Simple geometry
iPad2: 0.07ms, Tegra2: 2.0ms •Complex geometry iPad2: 0.13ms, Tegra2: 3.8ms •Scales linearly with overdraw
血战 Tegra2机型 渲染性能优化方案
工具准备 perHUDEs(2hours)
NVIDIA PerfHUD ES
Runs on Windows, OSX, Linux •Very useful!
GPU time per draw call
Cycles per shader Force 2x2 texture
Null view rectangle
研究整个插件中shader的效率(周五之前)
计划需要优化 blinn-phong with LUT Baking tools
Texture Compression
Tegra has DXT5 use it
Sort opaque geometry(两天)
Ideal front-to-back not possible
Would need per-poly sort Different materials Rest by material
Big | Close: front to back
Character shader expensive
Player big, occludes a lot: always first
Enemies often occluded by cover: always last
Skybox always last Turn skybox off on “no way it’s visible” trigger zones
着色器的性能
IOS 优化例子
•37FPS after optimizations •Now runs 26 FPS with MORE stuff! • • • • • 90k vertices (lots skinned) A ton of particles 3-6 textures per fragment Bloom, Heat Shimmer 4xMSAA, Anisotropic 待研究
浙公网安备 33010602011771号