Hello.
So as a 3D artist it's important to know what the performance impact of my choices are going to be in-game. I don't work for a massive triple-A company who's been doing game art for 20 years which knows the secret to performance, and since google doesn't turn anything up of value I'm left with the question: What is the cost of a "draw call"?
That term is brought up a lot, more textures, more meshes, more materials/shaders etc = more draw calls. What exactly is this draw call and why would I care as an artist?
The way this draw call is always referred to, is as if increases the total "amount of draw calls", as if the higher the amount of calls the bigger the performance impact it has. But how much a draw call costs I can't actually understand.
What I can understand is how much triangles cost relatively speaking. I can compare how many millions of triangles would be rendered in a single frame of a triple-A title, which is often in the millions. I can compare how many triangles the average asset of this type would have, how many times this asset will be present in the environment, how important it is and how many triangles would I need for it to look reasonably detailed, and then I can conclude that my tri count might be on the high/low end, or that it's at a reasonable amount.
I can also understand how much texture resolution would cost relatively speaking. The higher the resolution, the more memory it consumes. If there's too much texture data in a single pixel, then that pixel should render slower (maybe I'm misinformed?). Theoretically having a smaller resolution should mean it should load faster from the HDD, load faster into RAM, load faster into VRAM, load faster into the gpu's processor/cache. I can use simple maths to say that a 2k texture should be equivalent to four 1k textures. I know that textures nearing 4k are often unreasonable for almost all assets, and textures near 1k are often too low for modern, complex weapon assets. I can figure out what would be a reasonable resolution after I've resized my UV islands based on their priority, packed the weapon and compared the distance and angles the weapon would be viewed at in-game.
Theoretically, a 1k texture is equivalent to four 512x512 textures. But that's where this issue of a draw call comes in. How much does it cost exactly?
They all add up to the same amount of texture, so where does the draw call come into play? Is the "data tunnel" that's going from the HDD to the GPU just being inefficiently used because it can carry far larger size textures at once, and it would be better to send all the data at once rather than in small chunks one after the other? If so is there even a impactful difference? Or do multiple textures actually have anything to do with draw calls? Do some engine's handle things differently?
This has come up in conversation with other artists, and we've struggled to conclude whether two 512x512 maps are more beneficial than one 1k map. Personally I would find it unreasonable and highly inefficient if they did perform anywhere near the same. After all it's a total of half the pixels of the 1k map. If it has half the total resolution, it should perform close to twice as fast surely?
I do get that there is some kind of impact from splitting up textures. That's one of the purposes of texture atlases/trims as far as I know. And it would seem completely reasonable to combine something like 8 or more textures into something larger if I believe that all or most of these would be used in the scene at the same time. But I can't comprehend the relative benefit.
To those with actual technical knowledge on this stuff, I really appreciate any responses.
Replies
Someone smarter than me will probably chime in, but this is how I understand it:
Draw calls are typically tied to how many materials an asset is using. A material with 1x1024 texture set takes one draw call to render. A material with 4x 512 texture sets takes 4 draw calls to render. Each new object takes a draw call as well.
In a very simplified sense, what this means is that:
An asset that uses 1x1024 is faster to render than an asset that uses 4x512
4 assets that use 500 triangles are slower to render than a single asset that uses 2000 triangles
Generally speaking, you should use as few materials and objects as you reasonably can.
In your example of 1x1024 vs 2x512, the 2x512 example uses less VRAM but is slower to render. Texture memory doesn't really have anything to do with how fast the material is to draw. The issue with VRAM is generally, do your assets fit into VRAM or not? If not, your systems may crash, or they may need to page data from RAM to VRAM, which is (relatively speaking) very slow to do.
In any case a 1x1024 vs 2x512 example doesn't make much sense. Why would half the texture resolution be comparable in the first place? Secondly, why wouldn't you simply use a 1024x512 if that's all the resolution you need?
Now, there are some cases where it can be very beneficial to use multiple materials on a single asset. Generally speaking, if you have some areas that need complex shaders, like skin shading or transparency, it usually makes sense. This way, if you have specialized maps you only need them for the areas in which they are applied (saving you VRAM), and the expensive shaders are only applied to the areas that need them (saving rendering time).
Let's say you have a character with metal armor, exposed skin, and hair. For the armor, you only need a standard shader. So that's one material. For the skin, you want a skin shader with some extra maps (thickness etc). For the hair, you need a special hair shader with opacity. This is 3 draw calls, which if we were using a standard shader on all, would be more expensive than 1. However, since two of the materials here have very complex shaders, the added cost of the draw calls is moot, and the texture memory savings and performance on the areas that only need a standard shader, not to mention the practical reasons for splitting the materials up, makes using multiple materials worth while.
Another example where using more draw calls is okay is assets that have interchangeable parts. Let's say you have a modular weapon system where the user can swap the magazine, scope, and muzzle flash. In this case, it makes sense for each of these elements to use an additional material/draw call, as the user can swap them out. We only want to load the texture content for the assets currently in use, so it would be very inefficient to pack them all into one set.
On the other hand, if you have a gun that will only ever be configured one way, with the magazine, scope, and muzzle always visible, there is little to no reason not to combine all of these with the body of the gun on a single texture set. Again, aside from any complex materials that may need fancy shaders, like the glass elements of a scope.
Another thing that is worth mentioning is that a draw call typically tends to have a fixed cost. There is a performance cost to draw any mesh, and this cost can be higher than the cost to actually render the triangles for very simple meshes. This means that one modern hardware, it often doesn't make sense to optimize a mesh lower than 500 triangles or so. It just won't get any faster to draw at 400 or 300 triangles because of the fixed cost of rendering a draw call, or the fact that the GPU is waiting on the CPU to give it the next thing to draw.
The 2x512 comparison was purely just for discussion. Since the conversations I had already concluded that more draw calls is worse (1x1024 > 4x512), and we were trying to find out what's the impact of multiple textures vs a single texture, that's the example that worked.
So as I see it a draw call is handled by the CPU, so there shouldn't be a different cost to a single draw call between something extremely simple and ridiculously complex.
Still though, understanding how much is this draw call worth in terms of performance is still an issue. But if they should have a fixed cost per draw call, then surely there's an average amount of draw calls per frame for games that we could find?
Comparing how much a typical scene or asset would need would probably give a good idea.
"This means that one modern hardware, it often doesn't make sense to optimize a mesh lower than 500 triangles or so."
So if triangles have this minimum count at which the draw call costs more than the actual rendering, can this translate to textures as well?
If I have a ridiculously modular weapon, what's the smallest texture resolution that's worth detaching into its own textureset?
What if I can replace all of the individual screws (completely hypothetically)? Is it worth splitting them into their own 64x64 maps, or would it only apply to assets that would need 256, or 512 etc?
@jStins Great link! I'll have to give these a read.
"Still though, understanding how much is this draw call worth in terms of performance is still an issue. But if they should have a fixed cost per draw call, then surely there's an average amount of draw calls per frame for games that we could find?
Comparing how much a typical scene or asset would need would probably give a good idea. "
Yeah this is beyond my knowledge a bit, I'm not sure if there are any good resources about it anywhere. I expect it's highly variable depending on hardware and engine but I don't know.
You can benchmark this stuff pretty easily on your own though. Create two scenes in the game engine of your choice:
One with one object that has 1,000,000 triangles
Another with 1,000 objects that have 1,000 triangles
This will show you the performance difference of one vs many draw calls, but with the same amount of geometry. Try 100x10,000 and 10,000x100 as well to get a sense of how draw call performance scales.
"So if triangles have this minimum count at which the draw call costs more than the actual rendering, can this translate to textures as well?"
No.
"If I have a ridiculously modular weapon, what's the smallest texture resolution that's worth detaching into its own textureset? What if I can replace all of the individual screws (completely hypothetically)? Is it worth splitting them into their own 64x64 maps, or would it only apply to assets that would need 256, or 512 etc? "
Again, it's important to realize that using a 64 or 256 or 1024 texture doesn't affect rendering speed. Bigger textures = more VRAM usage, but performance doesn't gradually degrade when larger textures or more VRAM is used. Generally speaking, it either fits into VRAM or it doesn't. It's not analogous to geometry, where more triangles = slower to render.
Rendering cost (outside of draw calls, which is more of a preparation cost than rendering), is typically broken down to something like this:
- How many triangles/vertices the asset has
- How large the asset is on screen
- How complex the shading is
- How many pixels you're drawing (screen resolution)
The size of the texture is neither here nor there. It's the number of pixels you're drawing, rather than the number of pixels in the texture map.It's also worth noting that for something like like a FPV weapon, draw calls matter very little. This is because its a singular asset, there will only ever be one of it on screen. Draw calls are much more important for things that you're likely to see tens or hundreds of at a time. Let's say you have a foliage system that places rock and stick meshes, you might see 1,000 of these at a time, if each of them had 2 draw calls for some reason this would likely be a big performance hit.
I guess it's hard to wrap my head around that, but it makes sense. It also explains why through my experience of switching textures in games through mods/texture packs to something ridiculously high has always been more or less unnoticeable in terms of frames.
- Vertice count matters in reality over triangle count
- Realtime lighting and shadows multiply your vertice count effectively in engine but is unavoidable
- Smoothing splits double the vertice count of course of the splits
- As one vertice can only have one material, UV, RGB etc, coordinate for each, adding a second UV doubles the vertice count again
- Layering materials on the same mesh should be avoided, split the mesh into the material parts as this again cases tons of unneccessary vertices as a vertice can only have one material, so they are doubled again for the new one
- Material instructions / complexity definitely is significant and should not be forgotten
- Forward rendering multiplies lights x vertice count so again draws more vertices, deferred draws depdning on the pixels on screen, know which type of engine you are on
- Vertices up to a certain degree are easily handled by decent GPUs, materials and drawcalls usually are more important to watch out for
- You can batch meshes that use the same material in engine usually easily, so in the end, material efficiency is key, the textures matter little less, as you get 1 drawcall per material and one per mesh.
As example; If you have one car and the car has one mesh+mat for interior, one mesh+mat for lamps, one mesh+mat for windows, and one mesh frame that uses multiple materials for the details on it, like a chrome trim and whatnot, this is easily 5-10x the vertice and drawcall / shadow caster count and effectively performance drain - than having it all in one mesh and material, even if the actual model is not any different.
Now where I scratch my head the most:
(?) Material transparency is very expensive (in Forward only?)
(?) Drawcall performance is not linear, after a threshold that it becomes the main performance bottleneck as there is a physical limit the CPU can handle well
(?) Textures operations effectively cost performance in a material depending on texture size, but texture amount is not really important until you reach the GPU cap of (16 at UDK times?) - it just is part of your shader performance budget and an operation like others
(?) You can save texture operations in shaders by saving different maps in different RGBA channels, which then is 4x more performant for the operations as 4 invidivual maps?
(??) The texture overhead is depending on the passthrough/bandwidth of your GPU, if that is too slow you will see bigger FPS losses, and as long as your video memory is not full, overall texture size in your game does not really matter, but when its full, its going to be a huge fps sink unless it automatically turns down to the lower mipmap - (Im really confused about the entire texture thing still, size dosnt matter but also it does?)
For everything else you mention it's a tradeoff between various resource pools (memory, bandwidth, processing capacity) - Once a resource pool is full you're bottlnecked and you'll see frames take longer than you want them to.
Eg.
A packed 4 channel texture means less texture reads but can be double the memory footprint of a 3 channel texture due to compression quirks. which is better depends on whether you're short on memory or time..
1. Is good to derivate missing normal Z in a normal map then storing full RGB channels for that normal? (shader) - I think I will save some memory because I have one free channel in normal map but shader will have more instructions. - correct me if I'm wrong.
2. Is it possible to have GPU instancing with decals?
2. It's possible, but it depends how your decals are generated and the rendering method used.
@Shrike: Transparencies in deferred rendering get rendered using forward rendering after the deferred rendering has taken place. You can't do transparencies in deferred rendering because you only store one depth value, which doesn't work for transparency. In forward rendering, all the opaque stuff is drawn near->far, then all the transparencies get drawn far->near. It's likely cheaper overall in deferred as the opaque surfaces can be deferred, but the cost of just drawing the transparent stuff is about the same.
http://www.adriancourreges.com/blog/2015/11/02/gta-v-graphics-study/
Big thanks, that clears up a lot of missing pieces!
My car example was more with a RTS in mind or so, for a racing game you will definitely use isolated materials
With vertex colors you could also get the perfect cuts tho if you planned the geo for that
Im a bit confused by the sentence above, it sounds like one texture with 4 channels takes twice the amount of 4 individual maps with 1 channel used - that sounds like the sentence has a wrong order?
I assume, it would be - RGB > RGBA > 3 maps > 4 maps - in terms of performance?
(Factor 0.5, Factor 1, Factor 1.5, Factor 2)? - All grayscale
Its been way too long since ive done realtime, at UDK times I knew ..
(Does it matter if I export grayscale or can I just only take one RGB output? Sould I change the compression even if I only take one output?)
- That's actually interesting regarding additional UV channels. Do you think you could estimate what the memory impact of an extra channel would be for something with say 10,000 verts with the same splits? I imagine it would be a little over 100 kilobytes if it just needs to hold extra coordinates, no?
- So the best set up would be to have as many maps with just RGB fully utilised, keep alphas in an independent map unless it has high fidelity? I would've guessed there's automated systems at this point where the engine would pack maps together to use up the optimal amount of memory, but I guess it can't really know which maps can and can't be packed together (like you wouldn't want the roughness map for terrain to be packed with a gun's metalness).
Thanks for sharing this great info peeps!
Regarding packing...
Every proprietary engine I've worked on over the last 5 years packs textures at build time and does its best to minimise duplication - the only manual packing I do is if I'm working in ue4
- Imagine CPU as in charge of sending work for GPU to do.
- Draw calls are essentially jobs CPU has to send GPU.
- Creating and switching between a large number of jobs have costs (and we kinda assume each job cost roughly the same to CPU).
- Hence we try to reduce the number of draw calls (by doing less work or smart batching).
- And the acceptable number of jobs is defined by target hardware.
Now there are tons of reasons for draw call batching to fail and potentially result in worse performance, what game artist can control are actually just a small part of it. And they vary by game engine, though some common limits applies, like: if mesh or materials aren't the same between 2 characters, then the batching is likely going to fail.
In general the most important advice is to make use of the engine's profiling tools. For a few years GPUs are quite powerful, you can generate drawcalls on the GPU without the need of CPU, you can pull a lot of data as you need it (aka bindless). As was mentioned, we have APIs that were modernized for higher throughput, submission from multiple threads etc.
Therefore engines can make use of a lot of features depending on their target platforms. They may bake/batch data in different ways.
"it depends" is more true than ever.