Home Technical Talk

The cost of a texture "draw call"? (Quantity vs Resolution)

polycounter lvl 6
Offline / Send Message
Geosmith polycounter lvl 6
Hello.
So as a 3D artist it's important to know what the performance impact of my choices are going to be in-game. I don't work for a massive triple-A company who's been doing game art for 20 years which knows the secret to performance, and since google doesn't turn anything up of value I'm left with the question: What is the cost of a "draw call"?

That term is brought up a lot, more textures, more meshes, more materials/shaders etc = more draw calls. What exactly is this draw call and why would I care as an artist?

The way this draw call is always referred to, is as if increases the total "amount of draw calls", as if the higher the amount of calls the bigger the performance impact it has. But how much a draw call costs I can't actually understand.

What I can understand is how much triangles cost relatively speaking. I can compare how many millions of triangles would be rendered in a single frame of a triple-A title, which is often in the millions. I can compare how many triangles the average asset of this type would have, how many times this asset will be present in the environment, how important it is and how many triangles would I need for it to look reasonably detailed, and then I can conclude that my tri count might be on the high/low end, or that it's at a reasonable amount.

I can also understand how much texture resolution would cost relatively speaking. The higher the resolution, the more memory it consumes. If there's too much texture data in a single pixel, then that pixel should render slower (maybe I'm misinformed?). Theoretically having a smaller resolution should mean it should load faster from the HDD, load faster into RAM, load faster into VRAM, load faster into the gpu's processor/cache. I can use simple maths to say that a 2k texture should be equivalent to four 1k textures. I know that textures nearing 4k are often unreasonable for almost all assets, and textures near 1k are often too low for modern, complex weapon assets. I can figure out what would be a reasonable resolution after I've resized my UV islands based on their priority, packed the weapon and compared the distance and angles the weapon would be viewed at in-game. 

Theoretically, a 1k texture is equivalent to four 512x512 textures. But that's where this issue of a draw call comes in. How much does it cost exactly?
They all add up to the same amount of texture, so where does the draw call come into play? Is the "data tunnel" that's going from the HDD to the GPU just being inefficiently used because it can carry far larger size textures at once, and it would be better to send all the data at once rather than in small chunks one after the other? If so is there even a impactful difference? Or do multiple textures actually have anything to do with draw calls? Do some engine's handle things differently?

This has come up in conversation with other artists, and we've struggled to conclude whether two 512x512 maps are more beneficial than one 1k map. Personally I would find it unreasonable and highly inefficient if they did perform anywhere near the same. After all it's a total of half the pixels of the 1k map. If it has half the total resolution, it should perform close to twice as fast surely? 

I do get that there is some kind of impact from splitting up textures. That's one of the purposes of texture atlases/trims as far as I know. And it would seem completely reasonable to combine something like 8 or more textures into something larger if I believe that all or most of these would be used in the scene at the same time. But I can't comprehend the relative benefit. 

To those with actual technical knowledge on this stuff, I really appreciate any responses. 

Replies

  • EarthQuake
    Check this out: https://medium.com/@toncijukic/draw-calls-in-a-nutshell-597330a85381

    Someone smarter than me will probably chime in, but this is how I understand it:

    Draw calls are typically tied to how many materials an asset is using. A material with 1x1024 texture set takes one draw call to render. A material with 4x 512 texture sets takes 4 draw calls to render. Each new object takes a draw call as well.

    In a very simplified sense, what this means is that:
    An asset that uses 1x1024 is faster to render than an asset that uses 4x512
    4 assets that use 500 triangles are slower to render than a single asset that uses 2000 triangles

    Generally speaking, you should use as few materials and objects as you reasonably can.

    In your example of 1x1024 vs 2x512, the 2x512 example uses less VRAM but is slower to render. Texture memory doesn't really have anything to do with how fast the material is to draw. The issue with VRAM is generally, do your assets fit into VRAM or not? If not, your systems may crash, or they may need to page data from RAM to VRAM, which is (relatively speaking) very slow to do.

    In any case a 1x1024 vs 2x512 example doesn't make much sense. Why would half the texture resolution be comparable in the first place? Secondly, why wouldn't you simply use a 1024x512 if that's all the resolution you need?

    Now, there are some cases where it can be very beneficial to use multiple materials on a single asset. Generally speaking, if you have some areas that need complex shaders, like skin shading or transparency, it usually makes sense. This way, if you have specialized maps you only need them for the areas in which they are applied (saving you VRAM), and the expensive shaders are only applied to the areas that need them (saving rendering time).

    Let's say you have a character with metal armor, exposed skin, and hair. For the armor, you only need a standard shader. So that's one material. For the skin, you want a skin shader with some extra maps (thickness etc). For the hair, you need a special hair shader with opacity. This is 3 draw calls, which if we were using a standard shader on all, would be more expensive than 1. However, since two of the materials here have very complex shaders, the added cost of the draw calls is moot, and the texture memory savings and performance on the areas that only need a standard shader, not to mention the practical reasons for splitting the materials up, makes using multiple materials worth while.

    Another example where using more draw calls is okay is assets that have interchangeable parts. Let's say you have a modular weapon system where the user can swap the magazine, scope, and muzzle flash. In this case, it makes sense for each of these elements to use an additional material/draw call, as the user can swap them out. We only want to load the texture content for the assets currently in use, so it would be very inefficient to pack them all into one set.

    On the other hand, if you have a gun that will only ever be configured one way, with the magazine, scope, and muzzle always visible, there is little to no reason not to combine all of these with the body of the gun on a single texture set. Again, aside from any complex materials that may need fancy shaders, like the glass elements of a scope.

    Another thing that is worth mentioning is that a draw call typically tends to have a fixed cost. There is a performance cost to draw any mesh, and this cost can be higher than the cost to actually render the triangles for very simple meshes. This means that one modern hardware, it often doesn't make sense to optimize a mesh lower than 500 triangles or so. It just won't get any faster to draw at 400 or 300 triangles because of the fixed cost of rendering a draw call, or the fact that the GPU is waiting on the CPU to give it the next thing to draw.
  • jStins
  • Geosmith
    Offline / Send Message
    Geosmith polycounter lvl 6
    @EarthQuake Some great info, appreciate it!
    The 2x512 comparison was purely just for discussion. Since the conversations I had already concluded that more draw calls is worse (1x1024 > 4x512), and we were trying to find out what's the impact of multiple textures vs a single texture, that's the example that worked. 

    So as I see it a draw call is handled by the CPU, so there shouldn't be a different cost to a single draw call between something extremely simple and ridiculously complex. 
    Still though, understanding how much is this draw call worth in terms of performance is still an issue. But if they should have a fixed cost per draw call, then surely there's an average amount of draw calls per frame for games that we could find? 
    Comparing how much a typical scene or asset would need would probably give a good idea. 

    "This means that one modern hardware, it often doesn't make sense to optimize a mesh lower than 500 triangles or so."
    So if triangles have this minimum count at which the draw call costs more than the actual rendering, can this translate to textures as well?
    If I have a ridiculously modular weapon, what's the smallest texture resolution that's worth detaching into its own textureset?
    What if I can replace all of the individual screws (completely hypothetically)? Is it worth splitting them into their own 64x64 maps, or would it only apply to assets that would need 256, or 512 etc? 

    @jStins Great link! I'll have to give these a read.

  • EarthQuake
    @Geosmith

    "Still though, understanding how much is this draw call worth in terms of performance is still an issue. But if they should have a fixed cost per draw call, then surely there's an average amount of draw calls per frame for games that we could find? 
    Comparing how much a typical scene or asset would need would probably give a good idea. "

    Yeah this is beyond my knowledge a bit, I'm not sure if there are any good resources about it anywhere. I expect it's highly variable depending on hardware and engine but I don't know.

    You can benchmark this stuff pretty easily on your own though. Create two scenes in the game engine of your choice:
    One with one object that has 1,000,000 triangles
    Another with 1,000 objects that have 1,000 triangles

    This will show you the performance difference of one vs many draw calls, but with the same amount of geometry. Try 100x10,000 and 10,000x100 as well to get a sense of how draw call performance scales.

    "So if triangles have this minimum count at which the draw call costs more than the actual rendering, can this translate to textures as well?"

    No.

    "If I have a ridiculously modular weapon, what's the smallest texture resolution that's worth detaching into its own textureset? What if I can replace all of the individual screws (completely hypothetically)? Is it worth splitting them into their own 64x64 maps, or would it only apply to assets that would need 256, or 512 etc? "

    Again, it's important to realize that using a 64 or 256 or 1024 texture doesn't affect rendering speed. Bigger textures = more VRAM usage, but performance doesn't gradually degrade when larger textures or more VRAM is used. Generally speaking, it either fits into VRAM or it doesn't. It's not analogous to geometry, where more triangles = slower to render.

    Rendering cost (outside of draw calls, which is more of a preparation cost than rendering), is typically broken down to something like this:
    1. How many triangles/vertices the asset has
    2. How large the asset is on screen
    3. How complex the shading is
    4. How many pixels you're drawing (screen resolution)
    The size of the texture is neither here nor there. It's the number of pixels you're drawing, rather than the number of pixels in the texture map.

    It's also worth noting that for something like like a FPV weapon, draw calls matter very little. This is because its a singular asset, there will only ever be one of it on screen. Draw calls are much more important for things that you're likely to see tens or hundreds of at a time. Let's say you have a foliage system that places rock and stick meshes, you might see 1,000 of these at a time, if each of them had 2 draw calls for some reason this would likely be a big performance hit.
  • Geosmith
    Offline / Send Message
    Geosmith polycounter lvl 6
    That clears things up a lot more. Hopefully someone can chime in with some kind of values, it would be interesting at least.

    I guess it's hard to wrap my head around that, but it makes sense. It also explains why through my experience of switching textures in games through mods/texture packs to something ridiculously high has always been more or less unnoticeable in terms of frames. 
  • Shrike
    Offline / Send Message
    Shrike interpolator
    Some from my puzzle pieces - Id really like some corrections if some of these are wrong 

    - Vertice count matters in reality over triangle count
    - Realtime lighting and shadows multiply your vertice count effectively in engine but is unavoidable
    - Smoothing splits double the vertice count of course of the splits
    - As one vertice can only have one material, UV, RGB etc, coordinate for each, adding a second UV doubles the vertice count again
    - Layering materials on the same mesh should be avoided, split the mesh into the material parts as this again cases tons of unneccessary vertices as a vertice can only have one material, so they are doubled again for the new one
    - Material instructions / complexity definitely is significant and should not be forgotten
    - Forward rendering multiplies lights x vertice count so again draws more vertices, deferred draws depdning on the pixels on screen, know which type of engine you are on
    - Vertices up to a certain degree are easily handled by decent GPUs, materials and drawcalls usually are more important to watch out for

    - You can batch meshes that use the same material in engine usually easily, so in the end, material efficiency is key, the textures matter little less, as you get 1 drawcall per material and one per mesh.

    As example; If you have one car and the car has one mesh+mat for interior, one mesh+mat for lamps, one mesh+mat for windows, and one mesh frame that uses multiple materials for the details on it, like a chrome trim and whatnot, this is easily 5-10x the vertice and drawcall / shadow caster count and effectively performance drain - than having it all in one mesh and material, even if the actual model is not any different. 

    Now where I scratch my head the most:

    (?) Material transparency is very expensive (in Forward only?)
    (?) Drawcall performance is not linear, after a threshold that it becomes the main performance bottleneck as there is a physical limit the CPU can handle well
    (?) Textures operations effectively cost performance in a material depending on texture size, but texture amount is not really important until you reach the GPU cap of (16 at UDK times?) - it just is part of your shader performance budget and an operation like others
    (?) You can save texture operations in shaders by saving different maps in different RGBA channels, which then is 4x more performant for the operations as 4 invidivual maps?
    (??) The texture overhead is depending on the passthrough/bandwidth of your GPU, if that is too slow you will see bigger FPS losses, and as long as your video memory is not full, overall texture size in your game does not really matter, but when its full, its going to be a huge fps sink unless it automatically turns down to the lower mipmap - (Im really confused about the entire texture thing still, size dosnt matter but also it does?)

  • sacboi
    Offline / Send Message
    sacboi high dynamic range
    "So as a 3D artist it's important to know what the performance impact of my choices are going to be in-game. I don't work for a massive triple-A company who's been doing game art for 20 years which knows the secret to performance, and since google doesn't turn anything up of value I'm left with the question: What is the cost of a "draw call"?"

    Yes indeed, some insight of what goes on "under the hood" can be beneficial for us as artists to understand when technicalities such as 'Draw Calls', 'LODs' and 'Mipmaps' are tossed around willy-nilly when in conversation alongside their respective effects performance wise, also I'd like to point out that Mr Google can be an obliging font of knowledge if one digs like a mole...in my case usually spurred on by a particular unscratchable itch. 

    Anyway I've had this gem, I think, bookmarked for a year now and still can't for the life of me pretend having a grip upon 80% of the content discussed let alone half...


    ...but through sheer doggedness I'd managed to comprehend (generally) the concepts involved therein, explained I have to say in terms that even a doofus like me can somewhat grasp whenever I happen to hit the export button.   
  • poopipe
    Offline / Send Message
    poopipe grand marshal polycounter
    Blended transparency is expensive regardless of your renderer.

    For everything else you mention it's a tradeoff between various resource pools (memory, bandwidth, processing capacity) - Once a resource pool is full you're bottlnecked and you'll see frames take longer than you want them to.

    Eg. 
    A packed 4 channel texture means less texture reads but can be double the memory footprint of a 3 channel texture due to compression quirks. which is better depends on whether you're short on memory or time.. 


  • Dash-POWER
    Offline / Send Message
    Dash-POWER polycounter lvl 6
    Speaking of performance optimization, I have a few questions:

    1. Is good to derivate missing normal Z in a normal map then storing full RGB channels for that normal? (shader) - I think I will save some memory because I have one free channel in normal map but shader will have more instructions. - correct me if I'm wrong.

    2. Is it possible to have GPU instancing with decals?
  • Farfarer
    1. Deriving the missing Z is more expensive than not deriving it, but visually speaking the quality improvement of storing two channels at a higher precision is generally worth the cost (it's mostly the cost of the square root that's expensive). On a desktop, I wouldn't think twice about it. On mobile, it depends on the target hardware.

    2. It's possible, but it depends how your decals are generated and the rendering method used.

    @Shrike: Transparencies in deferred rendering get rendered using forward rendering after the deferred rendering has taken place. You can't do transparencies in deferred rendering because you only store one depth value, which doesn't work for transparency. In forward rendering, all the opaque stuff is drawn near->far, then all the transparencies get drawn far->near. It's likely cheaper overall in deferred as the opaque surfaces can be deferred, but the cost of just drawing the transparent stuff is about the same.
  • Benjammin
    Offline / Send Message
    Benjammin greentooth
    Thank you everyone for sharing information. This might be interesting; when I saw the thread title I remembered reading this a couple years ago and being stunned by the numbers involved in rendering a single frame. Thousands of draw-calls...

    http://www.adriancourreges.com/blog/2015/11/02/gta-v-graphics-study/
  • JedTheKrampus
    Offline / Send Message
    JedTheKrampus polycounter lvl 8
    Shrike said:

    - Realtime lighting and shadows multiply your vertice count effectively in engine but is unavoidable
    - As one vertice can only have one material, UV, RGB etc, coordinate for each, adding a second UV doubles the vertice count again
    - Layering materials on the same mesh should be avoided, split the mesh into the material parts as this again cases tons of unneccessary vertices as a vertice can only have one material, so they are doubled again for the new one
    - Vertices up to a certain degree are easily handled by decent GPUs, materials and drawcalls usually are more important to watch out for

    As example; If you have one car and the car has one mesh+mat for interior, one mesh+mat for lamps, one mesh+mat for windows, and one mesh frame that uses multiple materials for the details on it, like a chrome trim and whatnot, this is easily 5-10x the vertice and drawcall / shadow caster count and effectively performance drain - than having it all in one mesh and material, even if the actual model is not any different. 

    Now where I scratch my head the most:

    (?) Material transparency is very expensive (in Forward only?)
    (?) Drawcall performance is not linear, after a threshold that it becomes the main performance bottleneck as there is a physical limit the CPU can handle well
    (?) Textures operations effectively cost performance in a material depending on texture size, but texture amount is not really important until you reach the GPU cap of (16 at UDK times?) - it just is part of your shader performance budget and an operation like others
    (?) You can save texture operations in shaders by saving different maps in different RGBA channels, which then is 4x more performant for the operations as 4 invidivual maps?
    (??) The texture overhead is depending on the passthrough/bandwidth of your GPU, if that is too slow you will see bigger FPS losses, and as long as your video memory is not full, overall texture size in your game does not really matter, but when its full, its going to be a huge fps sink unless it automatically turns down to the lower mipmap - (Im really confused about the entire texture thing still, size dosnt matter but also it does?)

    Hopefully I can deepen your understanding a bit. For the most part you are close to the truth. The items you're 100% correct on I've omitted from the quote.

    Regarding realtime lighting and shadows: Realtime shadows do require running vertex processing and draw calls for all of the meshes that the light affects. The way that shadows work is that the renderer draws a depth buffer from the lights' point of view, then when the lighting happens, if the fragment being shaded is further away from the light than it is in the depth buffer, the fragment is in shadow. So whenever you have a light with shadows enabled, it transforms and rasterizes all of the models affected by the light before the lighting happens, but doesn't perform shading as only the depth is needed.

    Lights are a bit of a different ballgame and whether or not they require more vertex processing depends on the rendering algorithm. If it's a deferred renderer, the meshes get drawn to a G-buffer, with all the information needed for lighting in a framebuffer that's rendered from the camera's point of view (except for shadow buffers.) So, adding more non-shadowed dynamic lights to a deferred renderer is pretty cheap because you don't have to run vertex processing more than once for all of your lights, and the amount of time it takes to do the lighting depends on the number of pixels that the light takes up on the screen. However, with a deferred renderer, you still must render shadow buffers for any lights that cast a dynamic shadow.

    In a forward renderer, dynamic lights are more expensive to render because you have to potentially do vertex processing more than once if the mesh is affected by multiple dynamic lights. You can see how this works in an old-style, multi-pass forward renderer like Doom 3 here. http://fabiensanglard.net/doom3/renderer.php (ctrl+f "Now the details of what happens in the GPU framebuffer:") Note that the results of each light are accumulated in the final frame, and there is no G-buffer. It is possible to do less passes and less vertex processing if you compute the results of multiple lights per pass, and this is more important to do the more vertices that there are. The combination of easy MSAA and higher potential performance makes single-pass forward rendering the natural choice for VR rendering.

    So: shadows definitely increase your effective vertex count, but lights don't necessarily. (It depends on the renderer.)

    Regarding second UV sets doubling vertex count again: This simply isn't true. In the case of a second UV set, the renderer will have an additional attribute per vertex. So the mesh will take up a little more memory, but it won't take any longer to rasterize.

    That's true as long as the UV seams are the same for the first and second UV set. If the seams are the same, the duplication of the vertices along the seam has already happened due to the first UV set. If the second UV set has seams that aren't a split normal or a seam already in the other vertex attributes, it will cause additional vertex duplication along those seams. But it is pretty rare that such a scenario happens.

    Regarding material layering: Absent of tricky techniques like UE4's hierarchical LOD or texture atlasing, having multiple material slots causes that number of draw calls to happen. Material layering in a shader like Epic does it saves draw calls, but increases the complexity of the shader somewhat. Layering materials in the texture authoring process is free in shaders/draw calls, but makes you a slave to texel density and can take up more texture memory to get the same amount of detail. Each approach has its place, but on mobile you probably want to use the Photoshop approach to layering materials.

    Regarding vertex count vs. draw call/materials: It depends on the scene and the hardware. A problem that you're more likely to run into is a rasterization bottleneck from really long, really skinny triangles and really small triangles. Importing your model into UE4 and using the quad overdraw view mode can help you diagnose areas of the mesh that are likely to cause a problem. This is the main performance problem that you can run into with Alien: Isolation style, extremely beveled models. For vertex count and draw call count, as long as you keep things reasonable you should be fine. On any renderer you will get reduced performance from having thousands of small draw calls towards the end of the frame with only a few hundred vertices per mesh. If you have a scene with lots of small accessories and it's causing performance problems, try to put them in a texture atlas and you should see things get better. UE4 HLOD can do this automatically for static meshes.

    Regarding the car example: this number of draw calls may actually be completely fine on a desktop game. Car rendering is a bit of a special case because the extremely glossy surface of the exterior makes normal map compression artifacts unacceptable. Modeling such an exterior requires a lot of polygons and probably transferring normals from another mesh. Using vertices for normal manipulation works better, because the normals get interpolated perfectly smoothly between the vertices.

    The windows must also be their own material unless it's acceptable for them to be extremely tinted, because if the windows have transparency enabled and they are in the same material as everything else, everything else will also have transparency enabled, which will decrease performance a lot and potentially cause triangle ordering jank. So, if you're rendering a car, you are probably looking at a few draw calls, but that's OK because they are getting put to good use.

    Now for the head scratchers!

    Material transparency: Yes, it is expensive, especially if you're trying for some sort of dynamic triangle ordering for hair, or some order-independent technique. Deferred renderers written by people who are trying to stay sane render transparency in a separate, forward pass. There are all sorts of ways to render transparency, and they're all expensive. However, if you need transparency for your scene, you should use it. Just don't go too crazy. If you're rendering a skyscraper from the outside, you might be able to get away with making the windows shiny.

    Draw call performance: It depends on the graphics API in use, and in some cases how well the API is being used. Even in an API where individual draw calls are cheaper, if you have tens of thousands of draw calls and 80% of them have only a few triangles, you'll end up with a "death by a thousand cuts" situation where the GPU doesn't perform as well as it could. Vulkan and DX12 can handle more draw calls than DX11 and OpenGL, if the renderer was written in the best way for each API. In general, if the number of draw calls is less than "way too many" you will probably be fine on the GPU side. On the CPU side, additional draw calls require the CPU to set up more commands for the command lists for the GPU, and that takes time, but you can use quite a lot of them with the new APIs. Again: This isn't a license to go crazy with them!

    Draw calls and texture slots have basically nothing to do with each other, but draw calls and material slots (and, for that matter, individual meshes using different materials from each other) are usually 1:1.

    On mobile, it's still important to use less draw calls, as it means less work for the CPU and less power consumption.

    Texture performance: You're more or less correct. The main performance reason to use less textures is simply that textures take up a lot of video memory. There are other costs-setting up the texture samplers, dealing with image layouts, and so on-but memory is the main one.

    Texture performance 2: The main reason to pack textures into channels is to save video memory. A DXT1 texture, 1024x1024 with red, green and blue channels takes about 0.5 MB of RAM, and a DXT5 texture, 1024x1024 with red, green, blue and alpha takes about 1 MB of RAM. So if you have four masks that you need, not only will you be taking up 3 more texture slots that you could use for something else, you will take up twice the video memory compared to putting each one in a grayscale DXT1 compressed texture. Of course, if you have three masks and you can pack them into a texture with no alpha channel, that will save even more video memory. However, the DXT5 packed mask compresses the alpha channel independently, so if one mask has a lot of high frequency detail it should go in the alpha channel to avoid cross-talk with the other channels. The mask with the second most detail should go in the green channel, as that gets one more bit in a DXT5 texture than the other two color channels.

    However, doing this won't appreciably increase shader performance.

    Texture performance 3: What you're talking about happens when you've used more VRAM than you're supposed to. If the graphics card can't fit all of the needed textures into memory, it will have to go out, over the PCI-E bus, to main memory to be able to fill in that texture. This is more expensive than just about anything and of course you should avoid it. As long as you use texture compression, avoid super high resolution textures, and keep the number of textures in your scene reasonable, you're unlikely to run into problems with this on a desktop GPU. This is the problem that id tech's megatextures are meant to solve.

    I hope this was helpful to you. Many Bothan spies died to bring you this information.
  • Shrike
    Offline / Send Message
    Shrike interpolator

    about 1 MB of RAM. So if you have four masks that you need, not only will you be taking up 3 more texture slots that you could use for something else, you will take up twice the video memory compared to putting each one in a grayscale DXT1 compressed texture. Of course, if you have three masks and you can pack them into a texture with no alpha channel, that will save even more video memory.

    Big thanks, that clears up a lot of missing pieces!
    My car example was more with a RTS in mind or so, for a racing game you will definitely use isolated materials
    With vertex colors you could also get the perfect cuts tho if you planned the geo for that

    Im a bit confused by the sentence above, it sounds like one texture with 4 channels takes twice the amount of 4 individual maps with 1 channel used - that sounds like the sentence has a wrong order?
    I assume, it would be - RGB >  RGBA > 3 maps > 4 maps  -  in terms of performance?
    (Factor 0.5, Factor 1, Factor 1.5, Factor 2)? - All grayscale

    Its been way too long since ive done realtime, at UDK times I knew ..
    (Does it matter if I export grayscale or can I just only take one RGB output? Sould I change the compression even if I only take one output?) 


  • RyanB
    A very valuable skill is learning how to use profiling tools in your game engine AND on the hardware platform of your game.  You will often be surprised at what is hurting memory and framerate. 

    Use a frame debugger/analyzer to look at how things are rendering too.  You might have multiple objects that you think should batch and be one draw call but might be rendering any which way.  Maybe an artist scaled them all slightly differently or someone duplicated a material.  A weird one I found recently was a mesh renderer with three material slots but only one material was being used so three draw calls but only one was useful.

    Budget time for optimization!  If you plan on working up to the wire your game will probably run like crap.  Even small games made by small teams need two to three weeks for a technical artist to clean up shaders, vfx, improve draw calls, etc.  Budget even more time for a big game.

    A common problem I've seen in mobile game development is not building the game and looking at it on a phone.  Months will go by and hardly anyone is looking at it on the correct hardware.  Some studios don't even know what the lowest end device they plan to ship the game on, especially on Android devices.  If you plan to make a mobile game, you need to build and test on the LOWEST end phone you want to target.  Counting draw calls is meaningless if you don't even know how many draw calls your platform can handle.


  • Geosmith
    Offline / Send Message
    Geosmith polycounter lvl 6
    RyanB said:
     A weird one I found recently was a mesh renderer with three material slots but only one material was being used so three draw calls but only one was useful. 
    That's pretty scary considering how often 3DS Max likes to create new geometry with a new material ID. And we're supposed to remember to set every poly to have just the 1 :neutral: 


    - Regarding second UV sets doubling vertex count again: This simply isn't true. In the case of a second UV set, the renderer will have an additional attribute per vertex. So the mesh will take up a little more memory, but it won't take any longer to rasterize.

    - Texture performance 2: The main reason to pack textures into channels is to save video memory. A DXT1 texture, 1024x1024 with red, green and blue channels takes about 0.5 MB of RAM, and a DXT5 texture, 1024x1024 with red, green, blue and alpha takes about 1 MB of RAM. So if you have four masks that you need, not only will you be taking up 3 more texture slots that you could use for something else, you will take up twice the video memory compared to putting each one in a grayscale DXT1 compressed texture. Of course, if you have three masks and you can pack them into a texture with no alpha channel, that will save even more video memory. However, the DXT5 packed mask compresses the alpha channel independently, so if one mask has a lot of high frequency detail it should go in the alpha channel to avoid cross-talk with the other channels. The mask with the second most detail should go in the green channel, as that gets one more bit in a DXT5 texture than the other two color channels.
    - That's actually interesting regarding additional UV channels. Do you think you could estimate what the memory impact of an extra channel would be for something with say 10,000 verts with the same splits? I imagine it would be a little over 100 kilobytes if it just needs to hold extra coordinates, no? 

    - So the best set up would be to have as many maps with just RGB fully utilised, keep alphas in an independent map unless it has high fidelity? I would've guessed there's automated systems at this point where the engine would pack maps together to use up the optimal amount of memory, but I guess it can't really know which maps can and can't be packed together (like you wouldn't want the roughness map for terrain to be packed with a gun's metalness). 

    Thanks for sharing this great info peeps! 
  • JedTheKrampus
    Offline / Send Message
    JedTheKrampus polycounter lvl 8
    Geosmith said:

    <snip>
    - That's actually interesting regarding additional UV channels. Do you think you could estimate what the memory impact of an extra channel would be for something with say 10,000 verts with the same splits? I imagine it would be a little over 100 kilobytes if it just needs to hold extra coordinates, no? 

    - So the best set up would be to have as many maps with just RGB fully utilised, keep alphas in an independent map unless it has high fidelity? I would've guessed there's automated systems at this point where the engine would pack maps together to use up the optimal amount of memory, but I guess it can't really know which maps can and can't be packed together (like you wouldn't want the roughness map for terrain to be packed with a gun's metalness). 

    Thanks for sharing this great info peeps! 
    I can do better than estimate. I set up a quick Unity project with the AUG model from CS:GO, which has 16274 triangles and 8840 verts in Blender. By default the mesh has 1 UV channel. Importing it with 1 UV channel shows a vertex count of 12553 verts, 16274 triangles.



    Profiling the scene shows a mesh memory usage of 1.0 MB for this mesh. To get to the profiler, use Build and Run, check Development Build and Autoconnect Profiler.



    I added another UV channel and autopacked the UVs on it to make them different. The new mesh looks identical but has a second UV set. It has 12557 vertices. Presumably this is because some UVs were stacked or duplicated somewhere and not on a hard edge. Autopacking the UVs resulted in a few edges that weren't there before. Profiling the build with the new mesh shows a memory usage of 1.1 MB. So the additional cost is certainly there, but it's not outrageous if you need the second UV for something. Particularly for simple environment models using tilable textures and lightmaps the second UV set won't break the bank. I believe this method of measuring measures the amount of memory the mesh takes up in main memory, but the numbers should be similar or the same on the GPU.

    As for textures, you can measure the VRAM usage of those as well. I made a quick texture with independent masks (just some Krita patterns) in each channel and imported it into Unity. Here's the VRAM usage from DXT5 for a 1024x1024 texture.
    As you can see, it takes 1.3 MB. Without the alpha channel (DXT1) it takes 0.7 MB. Setting the texture to single channel, with the channel in question being the alpha channel brings it back up to 1.3 MB for a single channel because it's uncompressed. And setting the texture to single channel, but using the red channel instead makes the texture use BC4 compression and takes 0.7 mb for a single channel. So, in a table of memory usage per channel, it's:

    1. DXT1 - 3 channels - 0.7 MB - 4.3 1024x1024 masks per MB
    2. DXT5 - 4 channels - 1.3 MB - 3.0 1024x1024 masks per MB
    3. BC4 - 1 channel - 0.7 MB - 1.4 1024x1024 masks per MB
    4. Uncompressed - 1 channel - 1.3 MB - 0.77 1024x1024 masks per MB
    So DXT1 is better for memory usage, but the artifacts may be unacceptable depending on the type of mask in that channel. For example a roughness map probably has a lot of detail to it, and if I were writing a shader I would put that in a DXT5 alpha channel so it doesn't interfere with the compression of the other channels. DXT5 memory usage per mask channel isn't much higher and I'd say it's almost always acceptable for desktop rendering.
  • poopipe
    Offline / Send Message
    poopipe grand marshal polycounter
    Geosmith said

    That's pretty scary considering how often 3DS Max likes to create new geometry with a new material ID. And we're supposed to remember to set every poly to have just the 1

    Yes, that would be part of your job.. 



    Regarding packing...
    Every proprietary engine I've worked on over the last 5 years packs textures at build time and does its best to minimise duplication -  the only manual packing I do is if I'm working in ue4 
  • Dash-POWER
    Offline / Send Message
    Dash-POWER polycounter lvl 6
    @JedTheKrampus Just wanted to add that it's good to store date which should be more precise (roughness, custom data) to green channel if you are using DXT compression - 5:6:5 bits.
  • bitinn
    Offline / Send Message
    bitinn polycounter lvl 6
    Correct me if I am wrong but I think about draw call cost this way:

    - Imagine CPU as in charge of sending work for GPU to do.
    - Draw calls are essentially jobs CPU has to send GPU.
    - Creating and switching between a large number of jobs have costs (and we kinda assume each job cost roughly the same to CPU).
    - Hence we try to reduce the number of draw calls (by doing less work or smart batching).
    - And the acceptable number of jobs is defined by target hardware.

    Now there are tons of reasons for draw call batching to fail and potentially result in worse performance, what game artist can control are actually just a small part of it. And they vary by game engine, though some common limits applies, like: if mesh or materials aren't the same between 2 characters, then the batching is likely going to fail.
  • CrazyButcher
    Offline / Send Message
    CrazyButcher polycounter lvl 20
    Some in depth details here http://on-demand.gputechconf.com/gtc/2016/presentation/s6138-christoph-kubisch-pierre-boudier-gpu-driven-rendering.pdf although I also recommend the "renderhell" article by Simon.

    In general the most important advice is to make use of the engine's profiling tools. For a few years GPUs are quite powerful, you can generate drawcalls on the GPU without the need of CPU, you can pull a lot of data as you need it (aka bindless). As was mentioned, we have APIs that were modernized for higher throughput, submission from multiple threads etc.

    Therefore engines can make use of a lot of features depending on their target platforms. They may bake/batch data in different ways.

    "it depends" is more true than ever.
Sign In or Register to comment.