Just wondering if anyone has some technical knowledge of instancing in game engines.
Do you only save on loaded mesh data? Are they treated as a normal draw call when rendered etc...
Any other knowledge on the pros and cons of instancing.
I'm not sure if mesh data is the big saver, I think in most engines they actually create new geometry for each instance? I think its texture data and video memory that are the big savers when it comes to employing instances?
Collision and lighting calculations are a couple improvements.
Object culling is another big plus for using instancing.
Instanced triangles are always better than non-instanced. I can't copy and paste from the UDN as far as Unreal goes, but it is definitely a big improvement.
Remember that you will always draw the entire static mesh if there is one triangle showing on screen - so for example, if you have a building it would be better to instance windows, detail pieces, columns, etc. as opposed to one giant static mesh.
I think the tradeoff you have to watch for is the lighting calculations as far as individual lightmaps vs. vertex lighting, but that's a whole other ball of wax that the eggheads who know more about performance than I do will have to explain.
This is part of the whole modularity wave that game design is more and more getting smarter about. If you want a lesson in modularity, check out Gears or UT3. Epic is a god amongst men when it comes to modularity in both textures and geometry.
[ QUOTE ]
I'm not sure if mesh data is the big saver, I think in most engines they actually create new geometry for each instance. I think its texture data and video memory that are the big savers when it comes to employing instances.
[/ QUOTE ]
Wuh? Isn't that the exact opposite of what instancing is for?
Surely if you have, say, a tree model and instance it, you can't possibly save any texture memory since it's going to have to load the same texture regardless of if you have 1 or 100 of the trees on screen at once... instancing won't help you save texture memory.
The way I understand it, it will save on mesh memory and collision data since instead of storing every single object uniquely, you just store the initial geometry and then calculate the positions of all the others based off this. So you're only loading one model and duplicating it via code rather than loading a whole bunch of models by treating each one as unique...
I may be wrong, but I'm pretty sure Vig is wronger
From what I understand, it's a question of VCard vs. CPU cost.
Again, someone with more understanding of the costs would be able to chime in, but nonetheless - it would be MUCH better to have 1 tree copied 100 times than it would be to have 1 mesh of 100 trees.
It also has the benefit of not having to set up render passes for unique objects... you set up the rendering once, and then run it over and over for each new object. So, say, you have 100 objects, instead of having to set up all your constants and states and textures, etc., 100 times, you just set it once and render 100 objects, much quicker for the CPU. Especially true for animated objects, where you won't have to set up all those bone matrices for each object (but they do have to be animating identically, unfortunately, at least in any implementation I know of). Think of it the same way as batching gives you better performance.
meshes are stored "once", and can then be rendered with many drawcalls at different locations (just a matter of setting the transform matrix) (edit: as prof said above)
meshes can also be stored more frequently for collision purposes (however for physics I guess still a lot of basic primitives are favored), but also for rendering, allowing a single drawcall for repeated geometry which then is transformed like with a single "bone" per object to worldspace.
mesh-instancing, can be the just mentioned (large mesh out of many copied mesh data) or newer techniques also allow to render the same mesh with a single drawcall (or in case of opengl still many but lower cost) at many places without additional storage costs.
The large mesh way would be dynamic mesh data that is updated frequently, which is why its often particle stuff (number of grass or effect particles visible varies with camera movement).
typical it's called instancing when there is some specialized render path that allows low cost rendering of "smaller" meshes many times (think bushes, debris...). How that is achieved over the "other regular" meshes, can be different for each engine / hardware capability / technique picked... each engine might do it a bit different.
Taking the "static meshes" (buildings) in BF2 as example, they may be stored only once, but simply the lightmap coordinates are exchanged with each "instance" inside the map, for me that would be no "instancing" as its like 2-3 times instanced and a relative big mesh, so the drawcall saving from rendering them isnt so big (plus the fact that it's often unlikely all 3 are visible)
however the 6 quads for the grass or whatever would be classic case for mass instanced rendering.
This is just an example and it may not be exactly like BF2 does it, and reflects my opinion about "instancing". Slight definitions and so on might vary, but when you look at rendering slides about "instancing" at games conferences, its mostly about "grass" or "asteroids" and other "detail objects". But of course also more advanced stuff like "large crowd" rendering.
The BF2 buildings are a great example, especially like you said about the scale of them. The downside however with those big meshes, especially in some of the more crowded areas where a LOT of fighting is happening ON the building itself, (Such as the first flag cap area at Karkand) is the collision that has to take place on the individual static mesh.
It might make more sense to modulate those buildings a little so that the game isn't doing 100 collision calls a second with all the particles, players, rigid bodies etc all on the same mesh at once.
This is also a reason why it might be better to break apart your ground meshes, aside from the obvious lighting wins.
well frankly not sure how bf2 does it exactly, and you should be aware that physics/collision very often has its own representation of the world vs the visual world, so instancing can refer to rendering / physics and might coexist in different means... the collision static mesh, might be made of box primitives, a lower poly mesh or whatever, it doesnt have to be tris at all. Physics engine itself will have accelerator structures and own representation of the mesh to cope with these things, so its two different topics.
Yep, in ETQW we had different methods of instancing, you could instance visible meshes and collision meshes, but they were handled separately and some cases weren't best for both (I seem to recall something about distance objects were from each other being a factor, along with the amount).
So in some cases it'd be better to instance visual meshes, but not collision, and vice versa.
I honestly dont know the specifics of how it works with our tech, but I know that when something is instanced, you pay for that model 25 times. so basically, if you draw 500 trees, memory wise it only ammounts to 25 trees. Either way, it's always 25 no matter what. So in cases where you want less than 25 of a model, you actually dont want to use instancing. This may not be the case in a lot of other engines, but that's how ours works.
I wish I had more info on this, but even if I did, I don't think my NDA would allow me to talk about it. It's fun stuff though...
[ QUOTE ]
Wuh? Isn't that the exact opposite of what instancing is for?
Surely if you have, say, a tree model and instance it, you can't possibly save any texture memory since it's going to have to load the same texture regardless of if you have 1 or 100 of the trees on screen at once... instancing won't help you save texture memory.
[/ QUOTE ]I'm confused so having 100 trees with 100 unique 512x512 textures is less memory intensive then having 100 trees that reference 1 512x512? Maybe I don't understand how engines load textures but even as a disk space issue it makes sense for textures to be instanced when they can, isn't that the whole point of tiles?
Sorry if I'm a little dense and just not getting it... Until 6mo ago our engine was 2D heh so quite a bit of this is new ground for me... maybe I shouldn't have been so quick to jump in =P
Yeah I think you're just not getting it... I never said that 100 unique textures is less intensive than 1 texture. I said that instancing an object or not, has no effect on texture budget.
In the majority of cases you will be instancing objects sharing the same texture...
Imagine this:
- You have a tree model using 1 texture. You duplicate it 100 times without instancing. You now have 100 times as much model data since each object is stored uniquely. You still only load one texture into memory, as that is shared across all trees.
- You have a tree model using 1 texture. You instance it 100 times. You now don't have as much model data since (as Crazybutcher put it better earlier) you're basically transforming 1 object using code, rather than having to know the explicit location of every vertex. You still only have one texture used on all trees.
This is what I mean by instancing incurring no extra texture memory expense. Obviously if you want to use different textures on different trees, you will have extra texture memory expense regardless if you're instancing the trees or not.
ahh that makes sense, I'm on board now after reading the rest of the thread and your latest example. Your orginal example actually makes sense.
Read everything before posting... read THEN post...
Static meshes are cached into video memory, and so can be displayed many times with little extra overhead. Static meshes are a list of vertices stored once in the video memory (as opposed to "once per frame"), so drawing many copies of one static mesh is a relatively simple operation. When the static mesh is to be displayed on the screen, the engine only has to tell the video card where (and at which size, rotation, and with which textures) to do it.
I'm assuming that each instance still counts as a draw call?
it all depends on your engine. I've worked on a bunch of stuff for different platforms and different engines that used instances, each of them in significantly different ways with different advantages and different drawbacks.
on one project we went instance crazy, confident that we'd save lots of much needed memory - turned out, some way down the line, that every instance was going into a finite pool that we didn't know about until it ran out. Increasing that pool pissed most of the memory we'd saved. And so on
thing is we will get all kinds of "gossip" and opinions, simply as engines may do things differently, as I mentioned in my first post, you can render the same mesh at different locations with numerous ways, each way has its pros and cons.
hence some of you guys will say x and others y and other z, and each may be "right" for the engine they worked on and how "instancing" was defined there... I've implemented all kinds of drawing/batching/instancing rendering techniques, and really there is no "one way/truth"
# rendering "average mesh":
- setup state (textures, shading parameters)
- setup transform matrix (object to world to view to screen)
- draw mesh
- next mesh
- setup state
- .. do the same
typically you would sort batch whatever to minimize the "setup" overheads
# instanced mesh "simple" way:
- setup state (textures, shading parameters)
- setup transform
- draw mesh
- setup transform for next mesh
- draw mesh
- ...
less state changes (only transforms basically), still multiple draw calls
# instanced mesh "dynamic data"
- setup state
- setup transform (this time: world to view to screen)
- create large mesh by copying all meshes to a big one
two ways:
1. giving each "mesh" a unique ID which later is used to index a matrix (object to world). think about each mesh instance having a single rigid bone assignment.
2. pre-transforming the mesh data already a bit on the CPU. Think of a billboard, we could pass "center" points pre-transformed to world space and then in vertex shader add an "offset" (to corresponding corner) for every vertex, and every 4 vertices would then be a quad. But we might als even pre-transform the full mesh when copying together the big mesh...
- draw mesh
now we have less draw calls, we might still have more than 1 because the time spend writing a giant single mesh, and rendering that at a single draw call, might be less efficient then say rendering always 20 at a time or so.
# hardware instancing support
there is some low-cost functions in directx which allow you to render the same data multiple times with a single draw call, and with each "internal call" certain attributes and such are changed.
- setup state
- setup transform (world to view to screen)
- setup "instancing attributes"
- draw mesh instanced
That codepath afaik needs certain hardware capabilities in directx or so. Under OpenGL a similar effect can be achieved by exchanging transform matrices with low-cost attributes (pseudo isntancing) and still performing many drawcalls (opengl's pipeline isnt so sensitive to high drawcalls like dx).
# even more ways...
geometry shader, although not exactly meant for it, and no clue if that is efficient, but it allows to output more geometry than input with each drawcall and you could do custom transforms for each new output.
but anyway, I am sure some other ways exist, but the mentioned above are what you will find described at conferences slides and so on. And as you see there is many ways to achieve the same effect... And there are reasons all are used to a degree
Replies
Object culling is another big plus for using instancing.
Instanced triangles are always better than non-instanced. I can't copy and paste from the UDN as far as Unreal goes, but it is definitely a big improvement.
Remember that you will always draw the entire static mesh if there is one triangle showing on screen - so for example, if you have a building it would be better to instance windows, detail pieces, columns, etc. as opposed to one giant static mesh.
I think the tradeoff you have to watch for is the lighting calculations as far as individual lightmaps vs. vertex lighting, but that's a whole other ball of wax that the eggheads who know more about performance than I do will have to explain.
This is part of the whole modularity wave that game design is more and more getting smarter about. If you want a lesson in modularity, check out Gears or UT3. Epic is a god amongst men when it comes to modularity in both textures and geometry.
I'm not sure if mesh data is the big saver, I think in most engines they actually create new geometry for each instance. I think its texture data and video memory that are the big savers when it comes to employing instances.
[/ QUOTE ]
Wuh? Isn't that the exact opposite of what instancing is for?
Surely if you have, say, a tree model and instance it, you can't possibly save any texture memory since it's going to have to load the same texture regardless of if you have 1 or 100 of the trees on screen at once... instancing won't help you save texture memory.
The way I understand it, it will save on mesh memory and collision data since instead of storing every single object uniquely, you just store the initial geometry and then calculate the positions of all the others based off this. So you're only loading one model and duplicating it via code rather than loading a whole bunch of models by treating each one as unique...
I may be wrong, but I'm pretty sure Vig is wronger
Again, someone with more understanding of the costs would be able to chime in, but nonetheless - it would be MUCH better to have 1 tree copied 100 times than it would be to have 1 mesh of 100 trees.
As MoP said, texture memory isn't the issue.
meshes can also be stored more frequently for collision purposes (however for physics I guess still a lot of basic primitives are favored), but also for rendering, allowing a single drawcall for repeated geometry which then is transformed like with a single "bone" per object to worldspace.
mesh-instancing, can be the just mentioned (large mesh out of many copied mesh data) or newer techniques also allow to render the same mesh with a single drawcall (or in case of opengl still many but lower cost) at many places without additional storage costs.
The large mesh way would be dynamic mesh data that is updated frequently, which is why its often particle stuff (number of grass or effect particles visible varies with camera movement).
typical it's called instancing when there is some specialized render path that allows low cost rendering of "smaller" meshes many times (think bushes, debris...). How that is achieved over the "other regular" meshes, can be different for each engine / hardware capability / technique picked... each engine might do it a bit different.
Taking the "static meshes" (buildings) in BF2 as example, they may be stored only once, but simply the lightmap coordinates are exchanged with each "instance" inside the map, for me that would be no "instancing" as its like 2-3 times instanced and a relative big mesh, so the drawcall saving from rendering them isnt so big (plus the fact that it's often unlikely all 3 are visible)
however the 6 quads for the grass or whatever would be classic case for mass instanced rendering.
This is just an example and it may not be exactly like BF2 does it, and reflects my opinion about "instancing". Slight definitions and so on might vary, but when you look at rendering slides about "instancing" at games conferences, its mostly about "grass" or "asteroids" and other "detail objects". But of course also more advanced stuff like "large crowd" rendering.
It might make more sense to modulate those buildings a little so that the game isn't doing 100 collision calls a second with all the particles, players, rigid bodies etc all on the same mesh at once.
This is also a reason why it might be better to break apart your ground meshes, aside from the obvious lighting wins.
So in some cases it'd be better to instance visual meshes, but not collision, and vice versa.
I wish I had more info on this, but even if I did, I don't think my NDA would allow me to talk about it. It's fun stuff though...
Wuh? Isn't that the exact opposite of what instancing is for?
Surely if you have, say, a tree model and instance it, you can't possibly save any texture memory since it's going to have to load the same texture regardless of if you have 1 or 100 of the trees on screen at once... instancing won't help you save texture memory.
[/ QUOTE ]I'm confused so having 100 trees with 100 unique 512x512 textures is less memory intensive then having 100 trees that reference 1 512x512? Maybe I don't understand how engines load textures but even as a disk space issue it makes sense for textures to be instanced when they can, isn't that the whole point of tiles?
Sorry if I'm a little dense and just not getting it... Until 6mo ago our engine was 2D heh so quite a bit of this is new ground for me... maybe I shouldn't have been so quick to jump in =P
In the majority of cases you will be instancing objects sharing the same texture...
Imagine this:
- You have a tree model using 1 texture. You duplicate it 100 times without instancing. You now have 100 times as much model data since each object is stored uniquely. You still only load one texture into memory, as that is shared across all trees.
- You have a tree model using 1 texture. You instance it 100 times. You now don't have as much model data since (as Crazybutcher put it better earlier) you're basically transforming 1 object using code, rather than having to know the explicit location of every vertex. You still only have one texture used on all trees.
This is what I mean by instancing incurring no extra texture memory expense. Obviously if you want to use different textures on different trees, you will have extra texture memory expense regardless if you're instancing the trees or not.
Read everything before posting... read THEN post...
Static meshes are cached into video memory, and so can be displayed many times with little extra overhead. Static meshes are a list of vertices stored once in the video memory (as opposed to "once per frame"), so drawing many copies of one static mesh is a relatively simple operation. When the static mesh is to be displayed on the screen, the engine only has to tell the video card where (and at which size, rotation, and with which textures) to do it.
I'm assuming that each instance still counts as a draw call?
on one project we went instance crazy, confident that we'd save lots of much needed memory - turned out, some way down the line, that every instance was going into a finite pool that we didn't know about until it ran out. Increasing that pool pissed most of the memory we'd saved. And so on
hence some of you guys will say x and others y and other z, and each may be "right" for the engine they worked on and how "instancing" was defined there... I've implemented all kinds of drawing/batching/instancing rendering techniques, and really there is no "one way/truth"
# rendering "average mesh":
- setup state (textures, shading parameters)
- setup transform matrix (object to world to view to screen)
- draw mesh
- next mesh
- setup state
- .. do the same
typically you would sort batch whatever to minimize the "setup" overheads
# instanced mesh "simple" way:
- setup state (textures, shading parameters)
- setup transform
- draw mesh
- setup transform for next mesh
- draw mesh
- ...
less state changes (only transforms basically), still multiple draw calls
# instanced mesh "dynamic data"
- setup state
- setup transform (this time: world to view to screen)
- create large mesh by copying all meshes to a big one
two ways:
1. giving each "mesh" a unique ID which later is used to index a matrix (object to world). think about each mesh instance having a single rigid bone assignment.
2. pre-transforming the mesh data already a bit on the CPU. Think of a billboard, we could pass "center" points pre-transformed to world space and then in vertex shader add an "offset" (to corresponding corner) for every vertex, and every 4 vertices would then be a quad. But we might als even pre-transform the full mesh when copying together the big mesh...
- draw mesh
now we have less draw calls, we might still have more than 1 because the time spend writing a giant single mesh, and rendering that at a single draw call, might be less efficient then say rendering always 20 at a time or so.
# hardware instancing support
there is some low-cost functions in directx which allow you to render the same data multiple times with a single draw call, and with each "internal call" certain attributes and such are changed.
- setup state
- setup transform (world to view to screen)
- setup "instancing attributes"
- draw mesh instanced
That codepath afaik needs certain hardware capabilities in directx or so. Under OpenGL a similar effect can be achieved by exchanging transform matrices with low-cost attributes (pseudo isntancing) and still performing many drawcalls (opengl's pipeline isnt so sensitive to high drawcalls like dx).
# even more ways...
geometry shader, although not exactly meant for it, and no clue if that is efficient, but it allows to output more geometry than input with each drawcall and you could do custom transforms for each new output.
but anyway, I am sure some other ways exist, but the mentioned above are what you will find described at conferences slides and so on. And as you see there is many ways to achieve the same effect... And there are reasons all are used to a degree