While I have definitions for the words in my head, can someone else state what these mean: Batch & Drawcall. I will put what I think they mean as I am sure its wrong.
"Batch"
-If a model is duplicated a number of times and its material isn't changed then they're all apart of the same 'batch' so long as they aren't grouped or defined by LOD's. Guh, that make sense? Probably not..
"Drawcall"
-Not entirely sure. I want to say that its when material layers (spec, diffuse, etc) are called to the frame buffer but I am not certain.
EDIT: Also, vertex normals. Aaargh, wtf! haha I always thought the normal of a triangle to be important and now I learn of vertex normals? Anyone have a handy picture demonstrating how a vertex's normal is defined?
Not entirely sure...but I think I'm in the right area here:
Batch: a chunk of verts that are sent from the CPU to the GPU. Batches are part of a draw call, and you can have several batches in a single drawcall. Batches are chunks of continuous verts that don't get broken by smooth groups, uv boundaries etc.
drawcall:
typically consists of "world matrix", fragment & vertex shader, render states (blend,alpha-, depth-test...), textures to be used and the geometry
the geometry is a vertex buffer + indices which make the triangles (1,2,3, 2,3,4 ...), you dont need to use all vertices in the buffer, so it doesnt really matter...
it might be that the vertex data is made of different streams, that reside in different buffers as well. but lets not overcomplicate things.
the reason those non-material "splits" are nasty, is simply storage space. the more "duplicates" the larger the vertexbuffer, and the less chance of reusing the same vertex.
the vertex chunk often resides on graphics card memory (static meshes). sometimes it may be copied dynamically (lots of copies concated of the same stuff for simple instancing, or say some manual vertices say from rocket trails, particles, shadow silhouettes..). These kind of copies are not super fast, most data in games is static.
it is simply the raw data for each vertex, i.e. it is already broken down to the "unique" vertex level. several vertices of course can be indexed multiple times when they share a triangle that doesnt require splits..
batching:
being able to "render as much as possible" in a single drawcall. that means try to maximize the triangles being drawn, as every time you "start a new drawcall" its not as fast as if rendering a lot with one call/batch.
so say we have that huge vertex data chunk, already stored on vram, so we dont need to send it per frame. Then batching basically means trying to make long series of "indices" into those vertices, that we actually need now.
I can recommend reading the pdfs linked to in the very first post (the very first slide of the batchbatchbatch paper starts with "What Is a Batch?")
Another reason to avoid extreme optimization: LODs.
So a few months back, Okkun made a good point to our team as we were making a bunch of LOD models. Some of our artists were focused on reducing poly counts to exactly the target numbers, even if it meant massacring a model. He made the point that if performance was poor, what might be the first way to improve things?
They'd bring the LODs in much closer, where those craptacular models are right in plain view.
So rather than completely destroying an LOD model to save a whopping 34 polygons, it might be better to just leave them in and maintain the proper silhouette and UV borders. Even if several dozen of these models are on screen at the same time, an extra 5000 polys means very little to a modern graphics card. But as Per was pointing out, bad looking art is bad looking art no matter how you slice it. And in the case of some of these LODs, the degradation for a very minimal polygon savings was quite extreme.
I've had long long discussions with my lead engine coder about this, and he has raised something which I don't think has been mentioned here yet:
Having bevels along edges that could otherwise be hard (as in the first toolbox example on this thread), whilst not increasing the vertex count, does lead to quite a lot more small triangles on screen, especially if applied to everything in an environment.
My lead engine guy says that because gpus can fill multiple pixels in parallel, but only within the same triangle, having lots of triangles that contain very few pixels leads to stalls in filling the pixels and thereby hits your fillrate.
Example - if your triangle is 2 pixels big on screen the maximum number of pixels that can be processed simultaneously is 2, when actually the GPU could be pushing a lot more through.
As our engine was fillrate limited (I believe most are these days), he felt that this was a significant factor and therefore said that we should use hard edges where possible.
yes thats true, but I think it was mentioned before (I put it into first post now). Thin triangles or triangles smaller than a pixel or just a few pixels will be bad for the reasons he mentioned.
That boxmesh example here, depends of course on the size of the box "on screen". Once it's just background, or always small on screen, the bevels would be overkill, and one can live with a few minor shading artefacts.
Ie its a question of LOD and "how the model is used". If you have a game with corridors and the box will always be exposed in reasonable size, there is no reason for LOD/lower res model, as the bevel tris will be okay (unless you talk about ultra fine bevels, which always would be too thin).
I've been wanting to actually do a side by side performance comparison test and see what the impact is when dealing with a whole environment, unfortunately it requires building the environment twice. If I get round to it I'll report back!
The start of this thread is pretty old and I haven't spent much time keeping up to date with the cutting edge of 3d engines. Are there any big changes in the way things are done worth knowing about that haven't been mentioned in this thread?
not much should have changed, basically this is more or less a hardware / priniciple rendering issue, instead of individual 3d engine's advancements.
We are pretty much fixed to "older" hardware anyway (especially with the consoles). Imo we wont see a "real change" until the all new consoles or all new gpu systems (larrabee or whatever advanced ati/nvi gpus in years to come) are mainstream, ie a few years to go still...
And it's hard to define "mainstream" with tons of wii, iphone, psp, ds or casual Pcs having "last-gen" hardware.
No, this was simply pointing out that technology is to a point where counting every last triangle isn't really a big deal any more. This is not an excuse to do sloppy work, or to not take appropriate optimization steps, but rather to say that it's better to leave in a few bevels to make the model look better.
Thanks for the reply CB, I'm also interested in how the new shadowing meathod works and how expensive it is. (compared to stencil shadows) if you dont mind explaining
You have to render the scene's depth from the lights frustum (classic shadow mapping). But as you only require depth, this is very fast (ie mostly vertex bound).
There is numerous tweaks to enhance quality, by rendering multiple (cascaded) shadowmaps for the sun, shearing... Which will just raise the number of how often a single object might be rendered to depth.
Once the shadowmaps are created, "normal" shaders can sample them (again different methods exist, for soft shadows, remove aliasing... or whatever).
The setup for stencil shadows as such was more complicated (requiring stencil buffer, requiring volumes/silhouettes...) However for less complex geometry this is still practical as it gives you pixel-perfect shadows, while the shadowmap approach has to do more tricks to hide aliasing. Then again its better suited for soft shadows.
Shadowmapping is more pipeline friendly, as you have a real "shadow texture" to sample and therefore can do more tricks. But it needs more tweaks and advanced methods to get really nice results.
So how expensive it is, depends on the quality you want to achieve, but you can get "more" out of it then stencil. And silhouette generation for stencil stuff requires CPU (slow) interaction on pre SM4 hardware (ie the majority of hardware out there).
I am given to understand that a lot of outside-the-box coders are seriously considering a revolution in rendering. Apparently a lot of them think that the advent of multi-core processors is going to make the traditional GPU obsolete, and that the standard rendering pipeline will also be phased out in a few more years. As I understand it, the ability to mutli-thread opens up new avenues for software-based rendering, avenues that will make software based rendering comparable to, and in some cases superior to, traditional GPU-supported rendering.
Older rendering techniques that were discarded years ago are now being looked at anew. I believe there are some who are attempting to resurrect voxel rendering. It could be that in the future polygonal modeling will be rendered obsolete.
Of course, this is pure conjecture at this point. It will be years before multi-core processors are common enough to make such development financially viable. And current polygon-based tools and engines are so prevalent that such a major shift in methodolgy is sure to be slow.
Still, its probably a good time to be a C programmer.
illustrates how triangulation strategies affect speed. Basically shows the micro thin triangles issue vs large area triangles. Bear in mind that the "number of sides" is really high in the stats, something you will not reach in regular modelling. Ie the low numbers you work with have same performance costs, so dont redo your cylinders
That was a great little read.
I guess these are issues the engine programmer needs to deal with rather than the modeler, since the modeler will supply mostly quad based geometry.
However, if you're following the workflow of reimporting high poly geometry into a modeling app (3ds max/maya) and then optimizing it, then the optimizer should perform the recursive area subdivision which is, I think, the way 3ds max does it.
I guess these are issues the engine programmer needs to deal with rather than the modeler, since the modeler will supply mostly quad based geometry.
Modelers need to be aware of what their hidden edges are doing, and not think that engines work in quads but know that everything will be interpreted into triangles. In Max its pretty easy to flip hidden edges around, which sometimes falls on the rigging/animation guy. In Maya I think you have to force the edge by creating it or making it visible? It tends to re triangulate the hidden edges on its own... maybe there's a way to force it to stop and I haven't found it yet.
@artsy, I agree normally you dont run into extreme situations like that, ie most triangles in a gamemesh should have similar sized edges.
For reimport I think useability is more important, ie something like the last triangulation scheme would be sucky to work with.
Would it be easily scriptable to recalculate the edges based on the last method, say on a selection or only on polys with more then 2 tris? Maybe evaluate the mesh kind of like STL Check, highlight any polys that might not be optimal and give the person the chance to deselect/select them before changing them around?
I'm not suggesting anyone get cracking on this, just wonder if its possible... seems like it would...
well first of all we are talking of more than 48 sided ngons, I very much doubt you will find those in real world I somewhat think these images of the 12 sided cylinders burnt into your head, that even on that detail level layout would matter much... which I doubt
I second the idea of highlighting triangles with extreme porportions, and whose relative area compared to the rest of a mesh is very small...
While the difference in speed is impressive, the Max Area method of triangulation completely ruins your normals. When I have to choose between a performance boost and correct normal maps, I prefer the latter.
I will remove the link, it creates too much false impressions
zwebbie you will never gain a performance boost on regular "ingame" cylinders because they have too little sides... and yes with so little geometry interpolation issues are much more important.
i think the b cone would be best.
cone a needs plenty of smoothing groups the look smoothed without over-smooth the cone end. something like face a smoothes with b, face b with c and a, c with d and b.
uv space uses the same vertices as the normal mesh and 2 vertices more for the seam.
in cone b you can smooth all group 1 except the cone head. uv space uses same as mesh and 2 vertices more for the seam.
cone c works like cone b but uses more vertices.
I never added that one here. but maybe it could help some?
It was an attempt for someone to create a detailled military box from... a box (=cube )...
This is not really original work but it could help some to understand how to delete some polygons with "good" diffuse and some other shader?
I spent a solid few months of optimizing polys, lightmap UV channels, collish meshs for everything in UT and the act of
stripping 2million polys out of a level generally improved the FPS by 2 or 3 frames.
Polycount is not the huge issue people were rightly sure it was previously. The bigger issue now is texture resolution because all assets carry 3 textures as standard, normal map, diffuse and spec and thats before you have additional mask light textures for emmisives and reflection or whatever other stuff you are supporting in the shader.
Shader complexity is also a bigger issue now because it requires longer rendering time.
Section counts are a bigger issue , meshs carrying 2 of each texture and thus requiring a 2nd rendering pass.
I can't explain things technically enough for people but the coders have explained to me a couple times that the problem with just beating everything down with optimization on things like polycount doesn't affect things as much because of different things being CPU or GPU bound.
Mesh count is a big issue now that everything is a static mesh rather than the majority being BSP. BSP is terribly inefficient compared to mesh rendering also.
A mesh is pretty much free for the 1st 600 polys, beyond that its cost can be reduced dramatically by using lightmaps for generating self shadowing and so on rather than vertex lighting.
The reason also I was saying I wouldnt take out the horizontal spans on this piece was also largely because as an environment artist you have to be thinking about
the crimes against scale the level designers will often make
with your work to make a scene work.
Just because I know its a box , doesn't mean it wont get used as something else much larger so I always try to make sure
it can hold up, whatever it is, when 4 times the scale!
Butcher mentioned instancing, this is another feature that we relied upon much more heavily to gain performance.
Due to textures / BSP being more expensive now and polycounts cheaper we made things modular, very very modular.
For instance, I made a square 384/384 straight wall using a BSP tiling texture and generated about 60 modular lego pieces that use the same texture and all fit with themselves and each other to replace BSP shelling out of levels.
This led to lots of optimizations in general and quick easy shelling of levels and it gave our levels a baseline for
addition of new forms to the base geometry level.
I doubt I'm changing anyone's opinion here, maybe making a normal map driven next gen game will convince you though.
And by next gen, i just mean the current new technology like ID's new engine, crysis engine, UE3 etc because they
are normal map driven and the press likes fancy names
for simple progression of technology.
r.
DID ANYONE READ ALL THIS?! that fucking blew my mind thats so insane. all that crazy dedication, no wonder i still think UT is one of the best looking of the "next-gen" era.
Even if i come from the lowspec game dev side(wii&ds), i just can underline kevin johnstone's post. Of course you shouldn't waste polys on senseless edge loops and details, but it's the shaders and sfx, who send your performance to stop motion. To get our games on wii running 60fps our bootleneck is the shaders, which need to be simplified here an there. The vertex count is just a base issue and something comperable easy to optimize.
In his article, Guillaume Provost suggested that before optimizing, first it should be determined whether the mesh is transform or fill bound. Question: How does one estimate the transform and fill cost of a mesh? Are there any tools, scripts or plugins for this purpose?
Greetings. I'm a 3d modeller/level designer/game designer person, aspiring to be one of the above. I just want to make games, in general, so I'm interested in learning all associated trades.
Hence why I decided to become part of this community, where I hope to learn lots of new things, ask for, and in time, give advice.
Replies
Good thread dudes.
While I have definitions for the words in my head, can someone else state what these mean: Batch & Drawcall. I will put what I think they mean as I am sure its wrong.
"Batch"
-If a model is duplicated a number of times and its material isn't changed then they're all apart of the same 'batch' so long as they aren't grouped or defined by LOD's. Guh, that make sense? Probably not..
"Drawcall"
-Not entirely sure. I want to say that its when material layers (spec, diffuse, etc) are called to the frame buffer but I am not certain.
EDIT: Also, vertex normals. Aaargh, wtf! haha I always thought the normal of a triangle to be important and now I learn of vertex normals? Anyone have a handy picture demonstrating how a vertex's normal is defined?
Batch: a chunk of verts that are sent from the CPU to the GPU. Batches are part of a draw call, and you can have several batches in a single drawcall. Batches are chunks of continuous verts that don't get broken by smooth groups, uv boundaries etc.
drawcall:
typically consists of "world matrix", fragment & vertex shader, render states (blend,alpha-, depth-test...), textures to be used and the geometry
the geometry is a vertex buffer + indices which make the triangles (1,2,3, 2,3,4 ...), you dont need to use all vertices in the buffer, so it doesnt really matter...
it might be that the vertex data is made of different streams, that reside in different buffers as well. but lets not overcomplicate things.
the reason those non-material "splits" are nasty, is simply storage space. the more "duplicates" the larger the vertexbuffer, and the less chance of reusing the same vertex.
the vertex chunk often resides on graphics card memory (static meshes). sometimes it may be copied dynamically (lots of copies concated of the same stuff for simple instancing, or say some manual vertices say from rocket trails, particles, shadow silhouettes..). These kind of copies are not super fast, most data in games is static.
it is simply the raw data for each vertex, i.e. it is already broken down to the "unique" vertex level. several vertices of course can be indexed multiple times when they share a triangle that doesnt require splits..
batching:
being able to "render as much as possible" in a single drawcall. that means try to maximize the triangles being drawn, as every time you "start a new drawcall" its not as fast as if rendering a lot with one call/batch.
so say we have that huge vertex data chunk, already stored on vram, so we dont need to send it per frame. Then batching basically means trying to make long series of "indices" into those vertices, that we actually need now.
I can recommend reading the pdfs linked to in the very first post (the very first slide of the batchbatchbatch paper starts with "What Is a Batch?")
So a few months back, Okkun made a good point to our team as we were making a bunch of LOD models. Some of our artists were focused on reducing poly counts to exactly the target numbers, even if it meant massacring a model. He made the point that if performance was poor, what might be the first way to improve things?
They'd bring the LODs in much closer, where those craptacular models are right in plain view.
So rather than completely destroying an LOD model to save a whopping 34 polygons, it might be better to just leave them in and maintain the proper silhouette and UV borders. Even if several dozen of these models are on screen at the same time, an extra 5000 polys means very little to a modern graphics card. But as Per was pointing out, bad looking art is bad looking art no matter how you slice it. And in the case of some of these LODs, the degradation for a very minimal polygon savings was quite extreme.
Having bevels along edges that could otherwise be hard (as in the first toolbox example on this thread), whilst not increasing the vertex count, does lead to quite a lot more small triangles on screen, especially if applied to everything in an environment.
My lead engine guy says that because gpus can fill multiple pixels in parallel, but only within the same triangle, having lots of triangles that contain very few pixels leads to stalls in filling the pixels and thereby hits your fillrate.
Example - if your triangle is 2 pixels big on screen the maximum number of pixels that can be processed simultaneously is 2, when actually the GPU could be pushing a lot more through.
As our engine was fillrate limited (I believe most are these days), he felt that this was a significant factor and therefore said that we should use hard edges where possible.
Has anyone else heard this? Any thoughts?
That boxmesh example here, depends of course on the size of the box "on screen". Once it's just background, or always small on screen, the bevels would be overkill, and one can live with a few minor shading artefacts.
Ie its a question of LOD and "how the model is used". If you have a game with corridors and the box will always be exposed in reasonable size, there is no reason for LOD/lower res model, as the bevel tris will be okay (unless you talk about ultra fine bevels, which always would be too thin).
We are pretty much fixed to "older" hardware anyway (especially with the consoles). Imo we wont see a "real change" until the all new consoles or all new gpu systems (larrabee or whatever advanced ati/nvi gpus in years to come) are mainstream, ie a few years to go still...
And it's hard to define "mainstream" with tons of wii, iphone, psp, ds or casual Pcs having "last-gen" hardware.
There is numerous tweaks to enhance quality, by rendering multiple (cascaded) shadowmaps for the sun, shearing... Which will just raise the number of how often a single object might be rendered to depth.
Once the shadowmaps are created, "normal" shaders can sample them (again different methods exist, for soft shadows, remove aliasing... or whatever).
The setup for stencil shadows as such was more complicated (requiring stencil buffer, requiring volumes/silhouettes...) However for less complex geometry this is still practical as it gives you pixel-perfect shadows, while the shadowmap approach has to do more tricks to hide aliasing. Then again its better suited for soft shadows.
Shadowmapping is more pipeline friendly, as you have a real "shadow texture" to sample and therefore can do more tricks. But it needs more tweaks and advanced methods to get really nice results.
So how expensive it is, depends on the quality you want to achieve, but you can get "more" out of it then stencil. And silhouette generation for stencil stuff requires CPU (slow) interaction on pre SM4 hardware (ie the majority of hardware out there).
Older rendering techniques that were discarded years ago are now being looked at anew. I believe there are some who are attempting to resurrect voxel rendering. It could be that in the future polygonal modeling will be rendered obsolete.
Of course, this is pure conjecture at this point. It will be years before multi-core processors are common enough to make such development financially viable. And current polygon-based tools and engines are so prevalent that such a major shift in methodolgy is sure to be slow.
Still, its probably a good time to be a C programmer.
NOT suggesting to use certain layouts for "caps" of low poly, with so little geometry at hands interpolation artifacts are much more dominant...
Interesting read, thanks for posting, but like you said I don't think I'll be redoing any cylinders anytime soon...
That was a great little read.
I guess these are issues the engine programmer needs to deal with rather than the modeler, since the modeler will supply mostly quad based geometry.
However, if you're following the workflow of reimporting high poly geometry into a modeling app (3ds max/maya) and then optimizing it, then the optimizer should perform the recursive area subdivision which is, I think, the way 3ds max does it.
Modelers need to be aware of what their hidden edges are doing, and not think that engines work in quads but know that everything will be interpreted into triangles. In Max its pretty easy to flip hidden edges around, which sometimes falls on the rigging/animation guy. In Maya I think you have to force the edge by creating it or making it visible? It tends to re triangulate the hidden edges on its own... maybe there's a way to force it to stop and I haven't found it yet.
@artsy, I agree normally you dont run into extreme situations like that, ie most triangles in a gamemesh should have similar sized edges.
For reimport I think useability is more important, ie something like the last triangulation scheme would be sucky to work with.
I'm not suggesting anyone get cracking on this, just wonder if its possible... seems like it would...
I second the idea of highlighting triangles with extreme porportions, and whose relative area compared to the rest of a mesh is very small...
but I would leave it to the person fixing stuff
zwebbie you will never gain a performance boost on regular "ingame" cylinders because they have too little sides... and yes with so little geometry interpolation issues are much more important.
WHICH IS BEST
A = standard cone
B = Each loop has half the amount of edges
C = Most uniform method i could come up with
Edit, actually im sure 64, 48, 32, 16 would have been more uniform for C, oh well.
The casual market is still huge, and things like handhelds and phones still require low fidelity models.
cone a needs plenty of smoothing groups the look smoothed without over-smooth the cone end. something like face a smoothes with b, face b with c and a, c with d and b.
uv space uses the same vertices as the normal mesh and 2 vertices more for the seam.
in cone b you can smooth all group 1 except the cone head. uv space uses same as mesh and 2 vertices more for the seam.
cone c works like cone b but uses more vertices.
It was an attempt for someone to create a detailled military box from... a box (=cube )...
This is not really original work but it could help some to understand how to delete some polygons with "good" diffuse and some other shader?
http://www.samavan.com/3D/Realistic/Box_A/samavan_Obj_Box_A001.jpg
http://fc02.deviantart.net/fs48/o/2009/337/9/6/965775fcac1a424cde129ca0fad29247.jpg
http://www.samavan.com/3D/Realistic/Box_A/samavan_Obj_Box_A002.jpg
DID ANYONE READ ALL THIS?! that fucking blew my mind thats so insane. all that crazy dedication, no wonder i still think UT is one of the best looking of the "next-gen" era.
Hence why I decided to become part of this community, where I hope to learn lots of new things, ask for, and in time, give advice.