Fork me on GitHub

Optimizing Performance in iOS Part1:Optimizing Graphics Performance

If you want to optimize your content for iOS, then it is beneficial for you to learn more about iOS hardware devices.

 

Alpha-Testing

Contrary to the desktop, alpha-testing (or use of discard / clip operation in pixel shader) is very expensive on iOS. If you can replace your alpha-test shader with alpha-blend, do so. If you absolutely need to use alpha-testing, then you should keep areas of visible alpha-tested pixels to a minimum.

Vertex Performance

Generally you should aim at 40K or less vertices visible per frame when targeting iPhone 3GS or newer devices. You should aim at 10K or less vertices visible per frame when targeting older devices equipped with MBX GPU, such as: iPhone, iPhone 3G, iPod Touch 1st and 2nd Generation.

Lighting Performance

Per-pixel dynamic lighting will add significant cost to every affected pixel and can lead to rendering object in multiple passes. Avoid having more than one Pixel Light affecting any single object, prefer it to be a directional light. Note that Pixel Light is a light which has a Render Mode setting set to Important.

Per-vertex dynamic lighting can add significant cost to vertex transformations. Avoid multiple lights affecting single objects. Bake lighting for static objects.

Optimize Model Geometry

When optimizing the geometry of a model, there are two basic rules:

  • Don't use excessive amount of faces if you don't have to
  • Keep the number of UV mapping seams and hard edges as low as possible

Note that the actual number of vertices that graphics hardware has to process is usually not the same as what is displayed in a 3D application. Modeling applications usually display the geometric vertex count, i.e. number of points that make up a model.

For a graphics card however, some vertices have to be split into separate ones. If a vertex has multiple normals (it's on a "hard edge"), or has multiple UV coordinates, or has multiple vertex colors, it has to be split. So the vertex count you see in Unity is almost always different from the one displayed in 3D application.

Texture Compression

Use iOS native PVRT compression formats. They will not only decrease the size of your textures (resulting in faster load times and smaller memory footprint), but also can dramatically increase your rendering performance! Compressed texture requires only a fraction of memory bandwidth compared to full blown 32bit RGBA textures. For performance comparison check iOS Hardware Guide.

Some images are prone to visual artifacts in alpha channels of PVRT compressed textures. In such case you might want to tweak PVRT compression parameters directly in your imaging software. You can do that by installing PVR export plugin or using PVRTexTool from Imagination Tech -- creators of PVRT format. Resulting compressed image with .pvr extension will be imported by Unity Editor as is and manually specified compression parameters will be preserved.

If PVRT compression formats do not deliver enough visual quality and you need extra crisp imaging (for example UI textures), then you should consider using 16bit texture over full 32bit RGBA texture. At least you will reduce memory bandwidth by half.

Tips for writing well performing shaders

Although GPUs fully support pixel and vertex shaders since iPhone 3GS, do not expect to grab a desktop shader with complex per-pixel functionality and run it on iOS device at 30 frames per second. Most often shaders will have to be hand optimized, calculations and texture reads kept to a minimum in order to achieve good frame rates.

 

Complex arithmetic operations

Arithmetic operations such as powexplogcossintan etc heavily tax GPU. Rule of thumb is to have not more than one such operation per fragment. Consider that sometimes lookup textures could be a better alternative.

Do NOT try to roll your own normalizedotinversesqrt operations however. Always use built-in ones -- this was driver will generate much better code for you.

Keep in mind that discard operation will make your fragments slower.

 

Floating point operations

Always specify precision of the floating point variables while writing custom shaders. It is crucial to pick smallest possible format in order to achieve best performance.

If shader is written in GLSL ES, then precision is specified as following:

  • highp - full 32 bits floating point format, well suitable for vertex transformations, slowest
  • mediump - reduced 16 bits floating point format, well suitable for texture UV coordinates, roughly x2 faster than highp
  • lowp - 10 bits fixed point format, well suitable for colors, lighting calculation and other high performant operations, roughly x4 faster than highp

If shader is written in CG or it is a surface shader, then precision is specified as following:

  • float - analogous to highp in GLSL ES, slowest
  • half - analogous to mediump in GLSL ES, roughly x2 faster than float
  • fixed - analogous to lowp in GLSL ES, roughly x4 faster than float

For more details about general shader performance, please read the Shader Performance page.

 

Hardware documentation

Take your time to study Apple documentations on hardware and best practices for writing shaders. Note that we would suggest to be more aggressive with floating point precision hints however.

Bake Lighting into Lightmaps

Bake your scene static lighting into textures using Unity built-in Lightmapper. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:

  • It is going to run a lot faster (2-3 times for eg. 2 pixel lights)
  • And look a lot better since you can bake global illumination and the lightmapper can smooth the results

Share Materials

If a number of objects being rendered by the same camera uses the same material, then Unity iOS will be able to employ a large variety of internal optimizations such as:

  • Avoiding setting various render states to OpenGL ES.
  • Avoiding calculation of different parameters required to setup vertex and pixel processing
  • Batching small moving objects to reduce draw calls
  • Batching both big and small objects with enabled "static" property to reduce draw calls

All these optimizations will save you precious CPU cycles. Therefore, putting extra work to combine textures into single atlas and making number of objects to use the same material will always pay off. Do it!

Simple Checklist to make Your Game Faster

  • Keep vertex count below:
    • 40K per frame when targeting iPhone 3GS and newer devices (with SGX GPU)
    • 10K per frame when targeting older devices (with MBX GPU)
  • If you're using built-in shaders, peek ones from Mobile category. Keep in mind that Mobile/VertexLit is currently the fastest shader.
  • Keep the number of different materials per scene low - share as many materials between different objects as possible.
  • Set Static property on a non-moving objects to allow internal optimizations.
  • Use PVRTC formats for textures when possible, otherwise choose 16bit textures over 32bit.
  • Use combiners or pixel shaders to mix several textures per fragment instead of multi-pass approach.
  • If writing custom shaders, always use smallest possible floating point format:
    • fixed / lowp -- perfect for color, lighting information and normals,
    • half / mediump -- for texture UV coordinates,
    • float / highp -- avoid in pixel shaders, fine to use in vertex shader for vertex position calculations.
  • Minimize use of complex mathematical operations such as powsincos etc in pixel shaders.
  • Do not use Pixel Lights when it is not necessary -- choose to have only a single (preferably directional) pixel light affecting your geometry.
  • Do not use dynamic lights when it is not necessary -- choose to bake lighting instead.
  • Choose to use less textures per fragment.
  • Avoid alpha-testing, choose alpha-blending instead.
  • Do not use fog when it is not necessary.
  • Learn benefits of Occlusion culling and use it to reduce amount of visible geometry and draw-calls in case of complex static scenes with lots of occlusion. Plan your levels to benefit from Occlusion culling.
  • Use skyboxes to "fake" distant geometry.

 

Draw Call Batching

To draw an object on the screen, the engine has to issue a draw call to graphics API (OpenGL ES in case of iOS). Every single draw call requires a significant amount of work to be executed inside the graphics API. Therefore, each draw call causes significant performance overhead on the CPU side.

Unity is smart enough to combine a number of objects at the run-time and draw them together with a single draw call. This operation is called "batching". The more objects Unity can batch together the better rendering performance you will get.

Built-in batching support in Unity has significant benefit over simply combining geometry in the modeling tool (or using the CombineChildren script from the Standard Assets package). Batching in Unity happens after visibility determination step. Therefore, the engine can cull each objects individually thus keeping amount of rendered geometry at the same level as it would without batching. Combining geometry in the modeling tool, on the other hand, prevents effecient culling and results in much higher amount of geometry being rendered.

 

Materials

Only objects sharing the same material can be batched together. Therefore, if you want to achieve good batching, you need to share as many materials among different objects as possible.

If you have two identical materials which differ only in textures, you can combine those textures into a single big texture - a process often called texture atlasing. Once textures are in the same atlas, you can use single material instead.

If you need to access shared material properties from the scripts, then it is important to note that modifying Renderer.material will create a copy of the material. Instead, you should use Renderer.sharedMaterial to keep material shared.

 

Dynamic Batching

Unity can automatically batch moving objects into the same draw call if they share the same material. Batching dynamic objects has certain overhead per vertex, so batching is applied only to meshes containing less than 300 vertices.

Dynamic batching is done automatically and does not require any additional effort on your side.

 

Static Batching

Static batching, on the other hand, allows the engine to reduce draw calls for geometry of any size (provided it does not move and shares the same material). Static batching is significantly more efficient than dynamic batching. You should choose static batching as it will require less CPU power.

In order to take advantage of static batching, you need explicitly specify that certain objects are static and will not move, rotate or scale in the game. To do so, you can mark objects as static using the Static checkbox in the Inspector:

Optimizing <wbr>Performance <wbr>in <wbr>iOS <wbr>Part1:Optimizing <wbr>Graphics <wbr>Performance

Using static batching will require additional memory to store combined geometry. If several objects shared the same geometry before static batching, then a copy of geometry will be created for each object -- either in the Editor or at the run-time. This might not always be a good idea -- sometimes you will have to sacrifice rendering performance by avoiding static batching for some objects to keep a smaller memory footprint. For example, marking trees as static in a dense forest level can have serious memory impact.

Static batching is only available in Unity iOS Advanced.

 

Further Reading

  • Measuring performance with the Built-in Profiler
  • Rendering Statistics

 

 

Modeling Optimized Characters

 

Use one Skinned Mesh Renderer

Your character should use only a single skinned mesh renderer. There is usually no reason to use multiple meshes for a character. Unity also has optimizations related to visibility culling and bounding volume updating which only kick in if you use one animation component and one skinned mesh renderer in conjunction. If you care about performance, multiple skinned meshes per character is not an option. If you use two skinned mesh renderers for one character instead of one, the time spent on rendering the character will most likely double!

 

Don't Use Many Materials

You also want to keep the number of materials on that mesh as low as possible. There is only one reason why you might want to have more than one material on the character: when you need to use a different shader (e.g. if you want to use a special shader for the eyes). However, 2-3 materials per character should be sufficient in almost all cases. If your character is carrying a gun, it might be useful to have the gun a separate object, simply because it might get detached.

 

Reduce Amount of Bones

Medium Desktop games use bone hierarchies with 15-60 bones. The fewer bones you use the faster; with 30 bones you can achieve very good quality onDesktop platforms and fairly good quality on Mobile Platforms. Unless you really have to, we strongly recommend you use fewer than 30 bones if you are developing for Mobile Platforms and around 30 bones per character on Desktop platforms.

 

Polygon Count

How many polygons you should use depends on the quality you require and the platform you are targeting. Anything between 300-1500 triangles on Mobile Platforms and 500-6000 triangles on Desktop Platforms is reasonable. If you want lots of characters on screen or want the game to run on old machines, you will have to reduce the polygon count for each character. As an example: Half Life 2 characters used 2500-5000 triangles per character. Next-gen AAA games running on PS3 or Xbox 360 usually have characters with 5000-7000 triangles.

 

Separate Out IK and FK

Separate out inverse kinematics (IK) and forward kinematics (FK). When animations are imported, the IK nodes are baked into FK, thus Unity doesn't need the IK nodes at all. You can either kill the GameObjects in Unity or the nodes in the modelling tool. By removing them, the IK nodes don't need to be animated every frame anymore. For this reason it is a very bad idea to intermix IK and FK hierarchies. Instead, you should create two hierarchies: one strictly for IK and one for FK. This way you can very easily select the whole IK hierarchy and delete it.

 

Use Reusable Rigs

Create a rig which you can reuse. This allows you to share animations between different characters.

 

Name Bones Correctly

Name the bones correctly (left hip, left ankle, left foot etc.). Especially with characters, naming your bones correctly is very important. For example if you want to turn a character into a Ragdoll you will have to find the right bones for each body part. If they are not named correctly, finding all the body parts will take much longer.

 

Rendering Statistics Window

The Game View has a Stats button top right. When this Stats button is pressed, an overlay window is displayed with realtime rendering statistics. This is very useful for helping to optimize your game. The statistics displayed vary depending on your build target.

 
Optimizing <wbr>Performance <wbr>in <wbr>iOS <wbr>Part1:Optimizing <wbr>Graphics <wbr>Performance
Rendering Statistics Window.

Statistics window contains the following information:

Time per frame and FPS How much time it takes to process and render one game frame (and resulting FPS). Note that this number only includes frame update and rendering of the game view; and does not include time taken in the editor to draw the scene view, inspector and other editor-only processing.
Draw Calls How many objects are drawn in total. This accumulates objects that are drawn multiple times as well, for example some object that is affected by pixel lights will add several draw calls.
Batched (Draw Calls) Number of draw calls that where batched together. Batching means that engine will be able to combine rendering of multiple objects into one draw-call which leads to lower CPU overhead. To ensure good batching you should share as many materials between different objects as possible.
Tris andVerts Number of triangles and vertices drawn. This is mostly important when optimizing for low-end hardware
Used Textures Count and memory size of textures used when drawing this frame.
Render Textures Count and memory size of Render Textures that are created. Also displays how many times active Render Texture was switched during this frame.
Screen Size, anti-aliasing level and memory taken by the screen itself.
VRAM usage Approximate bounds of current video memory (VRAM) usage. Also shows how much video memory your graphics card has.
VBO total Number of unique meshes (vertex buffers) that are uploaded to the graphics card. Each different model will cause new VBO to be created. In some cases scaled objects will cause additional VBOs to be created. In case of a static batching however number of objects can share same VBO.
Visible Skinned Meshes How many skinned meshes are rendered.
Animations How many animations are playing.

posted on 2012-02-07 08:47  pengyingh  阅读(...)  评论(...编辑  收藏

导航

统计