Unity3D在手机上的性能优化

在手机和Desktop的显卡的差别

手机的GPU需要的电量要比Destop小100X

体积也要比台式机的小100X

自然性能要慢

解决工具

 1 分辨率

 2 后期处理

 3 MSAA(多重采样抗锯齿(MultiSampling Anti-Aliasing,简称MSAA)是一种特殊的超级采样抗锯齿(SSAA)。MSAA首先来自于OpenGL。具体是MSAA只对Z缓存(Z-Buffer)和模板缓存(Stencil Buffer)中的数据进行超级采样抗锯齿的处理。可以简单理解为只对多边形的边缘进行抗锯齿处理。这样的话,相比SSAA对画面中所有数据进行处理,MSAA对资源的消耗需求大幅减少,不过在画质上可能稍有不如SSAA。) 

 4 不同设备之间的不同处理

 5 着色器

 6 特效/粒子的密度

https://docs.unity3d.com/Documentation/Manual/MobileOptimisation.html

GPU的架构

Deferred Shading

在计算机图形领域,延期着色是一种在三维空间中着色的技术。这种着色的算法把目标分成很多的小块写入中间缓冲储存区,而后再合并。这种方法区别于直接把着色结果写进颜色的帧数缓冲里。在目前的硬件中,倾向于使用多重的渲染目标去避免重复转换矢量点。一旦所有需要的缓冲建好,就直接被读进一种着色算法中,合并在一起从而得出最后的结果。 这样,着色一个场景所需的计算和内存的带宽被减少到了这些可见的部分中,从而降低了着色深度的复杂性。

Tiled-based Deferred Shading

在进入正题之前,我们先回顾一下Intel在SIGGRAPH Courses 2010里提到的Tiled-based Deferred Shading。它的算法框架是: 1 生成G-Buffer,这一步和传统deferred shading一样。 2 把G-Buffer划分成许多16×16的tile,每个tile根据depth得到bounding box。 3 对于每个tile,把它的bounding box和light求交,得到对这个tile有贡献的light序列。 4 对于G-Buffer的每个pixel,用它所在tile的light序列累加计算shading。

Immediate

mmediate mode rendering is a style for application programming interfaces of graphics libraries, in which client calls directly cause rendering of graphics objects to the display. One example of "immediate mode" is using glBegin and glEnd with glVertex in between them. Another example of "immediate mode" is to use glDrawArrays with a client vertex array (i.e. not a vertex buffer object).

种类

•ImgTec PowerVR SGX (Tiled Deferred)

•NVIDIA Tegra(Classic (immediate))

•Qualcomm Adreno (Tiled)

•ARM Mali(Tiled)

例子:效率对比

0.8-3.5ms to reject 1280x600 fragments on Tegra2

can reject 2x2 pixels per cycle real scene: rejecting skybox in SHADOWGUN 0.9ms

0.05-0.13ms to reject 1024x768 fragments on iPad2

can reject 32 pixels per cycle 0.05ms coplanar geometry 0.11-0.13ms rejecting arbitrary geometry

Insight: it makes sense to spend lots of CPU cycles on Tegra to do occlusion culling and better sorting (to take advantage of Early Z-cull)

0.8-3.5ms to reject 1280x600 fragments on Tegra2 GPU 0.05-0.13ms to reject 1024x768 fragments on iPad2 GPU

Simple geometry

iPad2: 0.07ms, Tegra2: 2.0ms •Complex geometry iPad2: 0.13ms, Tegra2: 3.8ms •Scales linearly with overdraw

血战 Tegra2机型 渲染性能优化方案

工具准备 perHUDEs(2hours)

NVIDIA PerfHUD ES

 

Runs on Windows, OSX, Linux •Very useful!

GPU time per draw call

Cycles per shader Force 2x2 texture

Null view rectangle

研究整个插件中shader的效率(周五之前)

计划需要优化 blinn-phong with LUT Baking tools

Texture Compression

Tegra has DXT5 use it

Sort opaque geometry(两天)

Ideal front-to-back not possible

Would need per-poly sort Different materials Rest by material

Big | Close: front to back

Character shader expensive

Player big, occludes a lot: always first

Enemies often occluded by cover: always last

Skybox always last Turn skybox off on “no way it’s visible” trigger zones

着色器的性能

IOS 优化例子

•37FPS after optimizations •Now runs 26 FPS with MORE stuff! • • • • • 90k vertices (lots skinned) A ton of particles 3-6 textures per fragment Bloom, Heat Shimmer 4xMSAA, Anisotropic 待研究

posted @ 2015-06-19 21:53  蜡笔大新  阅读(1362)  评论(0)    收藏  举报