these long thing triangles are going to be very poor, so we would need something closer to the model on the right.
Yea thats what I said before, you can still put density where needed, and take away from where its not needed. If the spikes were only on the top, you could put slices like you did on the top and keep the side flat still. Because at a far distance you want to make sure that spikes or whatever are still going to show up by forcing enough geometry in your model so that at the lowest tessellation of 0, it will still have enough verts in the right places to displace.
Yeah, but at that point you're better off simply modeling it all. =P
Plus, you're not really gaining LODS down, just up, if you have to start with a model that has a relatively high level of detail.
Anyway I'm not trying to present a TESSELLATION SUCKS or TESSELLATION RULES argument, just learn more about it, as I'm genuinely interested in the possibilities. There certainly seem like a lot of negatives, but I'm sure there is some smart stuff we can come up with to get around them as well.
Yeah, but at that point you're better off simply modeling it all. =P
Pretty much. What is nice though is the scalability of it. So for any model that it would work well with, it scales on cpu's because the game can calculate its FPS and if its too low, just set the 2 numbers that control tessellation and do less of it.
Pretty much. What is nice though is the scalability of it. So for any model that it would work well with, it scales on cpu's because the game can calculate its FPS and if its too low, just set the 2 numbers that control tessellation and do less of it.
Yeah in the more long term this is where we're going to end up, with something that scales better than raw polygons. Whether that is displacement, actual sub-ds, voxels, a combination of all or something else entirely, we'll see.
In a game you cant just batch 10000 objects together. A lot of Nvidia's stuff is not really translated for games. Sometimes they give examples of a single character with cool stuff rendering at only 60 FPS. Again you can't batch so many objects together just because you want to. Assume you batched a whole house as 1 piece of geometry and 1 big texture array. Well your drawing the whole thing when you can only see one room which is basically saying your drawing 10x more stuff, to compensate for a tiny bit of Lag to send a few draw calls for just the stuff you seen in the room. 10x more work, to save only a few micro/nanoseconds to send maybe 20 draw calls for only the objects you see, is not the case to do this.
so what you saying that when using batched rendering you don't actually batch the majority of your scene?Is this how this works?
Not really no. Because they perform culling which determines what can be seen, so you have to balance batching and individually drawing each object. Like I said for a whole house it doesnt make sense, you will never see all of it. You are only batching static objects as well. So in an FPS online, you never usually see all the players, and even when you do, if they are batched in 1 draw call, you cant pick what LOD, so if your character up close is 20,000, then by using batching, you HAVE to use the high poly model to draw all of them.
So 20,000*32 players total = 1 batch draw call = 640,000
LOD: 20,000*1 = (20,000) + 10,000*3players close by = (30,000) + 2,000*28players far away = (56000) = 106,000 verts. 1/2 a million tris is no joke to render, would be a HUGE waste for saving 32 VERY VERY VERY negligible draw calls. And again this is the worst case for the game where you see all the players (which almost never happens). With 1 batch, YOU ARE ALWAYS DRAWING EVERYTHING. With the draw 1 by 1 model with LOD, you only draw the players you can see. So that 106,000 would chop in 1/2 or more.
Where batching works: 100,000 blades of grass surrounding me. I might be able to realized I can only see 1/4 of the them (25,000), but I don't want to draw 25,000 pieces of grass at 2 triangles each. Instead I'll batch them into chunks of the magic number 130 triangles (or more) and draw 75 blades at a time. Even if some chunks I only see 55 blades because there are some off screen in this chunk. In this case 25,000*LAG adds up. That small draw call *25,000 makes my LAG time huge.
The 2-3FPS issue comes down to. It was not 1 object he stripped. Over the course of a whole level he cut 2 million triangles. So if your level is 2 million, at any point on the level say you are drawing 1/50th of the level. Well for each single frame of render then, 1/50th of the 2 million would mean in one frame he only cut the vertex draw by 40,000. Had he actually had a single object like a statue that was 5 million tris and cut it down by 2 million, then his FPS (while staring at the object) would go WAY up (by like 100). While hes looking away from the statue though, it wont matter if he cut it because 5 million or 3 million, the CPU would have never told the GPU to draw it cuz we arent currently looking at it.
He also does not mention the FPS, the common misconception with a lot of new developers in gfx:
A = a drop in 500 FPS, man my game must suck right? well I lost .001 seconds of computation.
1000 FPS = .001 seconds to render 1 frame
500 FPS = .002 seconds to render 1 frame
B = a drop in 2 fps, man thats not bad at all right? well I lost .002 seconds of time
30 FPS = .033 seconds to render 1 frame
28 FPS = .035 seconds to render 1 frame
So a drop in FPS is relative to how big the FPS was in the first place. In model A, it dropped 500 but really that was less TIME than it was in model B to lose 2 frames per second. But really its simply the fact in my first sentence.
Ok then how do you explain that they actually told him that the first 600 tris are pretty much free?
Also that almost every single asset in UT3 levels bundled with UDK does not have LoDs whatsoever, and yet they still manage to maintain productivity that no one's complained about and were pretty much a graphic benchmark back in the day?
What they are basically computing is if you batch absolutely nothing, which means you use 1 draw command for each single object, and your Lag time is the same time it costs to draw 600 triangles: then by the time Lag time from 1 draw command to the second, a GPU will be able to draw 600 tris. So why not just let it always work and between each Lag, just draw at a minimum of 600 tris.
It draws the second object, 100tris, and finishes before it gets the next draw command (3rd) to get to the GPU.
Basically if you draw a 200 triangle object by itself without batching, then your GPU is going idle because it will be idle long enough that it could have just drawn 400 more triangles in the Lag time it needed for a new command. So you could just make everything no less than 600 tris. BUT, before going on, remember that pixel/vertex shaders share the stream processors, so the more advanced shader, the less triangles it will be able to push out because the vertex/pixel shader push/pull the work relative to the other.
If you are batching certain objects though, then your combining them collectively to be more than 600, so if you have objects less than 600 tris, they are saying, you should find a way to batch them, because, your basically drawing a couple low poly objects for free. Because one is 200 tris, one is 400 tris, if you draw them by themselves, you are faster to draw them both at once. Effectively saying you get it for "free". So do you want to batch objects and make it super fast, or increase the polys and still draw them 1 by 1. If you increase polys, your framerate will be the same, if you batch those though, you will get a better framerate because your drawing the same # of tris, with just less draw calls.
Also that almost every single asset in UT3 levels bundled with UDK does not have LoDs whatsoever, and yet they still manage to maintain productivity that no one's complained about and were pretty much a graphic benchmark back in the day?
Well 1, its free. 2 What is the polycount of those objects anyway. As you said anything less than 1,000 tris is overkill, once u are close to something like that who cares. 3 Optimize when you need to. My engine is SO damn unoptimized, but I dont care because until I want to add in some extra effects or something, I can still draw a shit ton these days because the gfx is so fast. Take a snapshot of any FPS game and you will see that there really aint much stuff being drawn. If in say crysis, I walk from a jungle into this shack with a coffee cup, it makes no sense to have LOD for the coffee cup, the only time I see it is close up. If the cup is on a dirt road in a jungle 200 feet away, then maybe I will LOD it.
Without any GPU optimizations at all, you can still do A LOT, its not like this Lag I'm talking about is 1/2 second to do a draw command. I mean those commands get there very fast and the gpu draws stuff pretty fast. High end cards with 320 processors eat 320 verts at a time.
It does cost money to re-topo and UV a smaller LOD as well, you add that by however many objects, and you have to look at all things, current LOD tri count, the coffee cup idea, can you batch items together, this object may be drawn 2x for a shadow map, will any optimization actually improve anything drastically to pay for a guy 50,000 to make the uber low poly LOD's for a year?
So whatever items you can batch together, those can be as low LOD as you want. As in my grass example, you have 2 triangles per grass blade. What you dont want to do is say, "well this is dumb 2 triangles when it could draw 598 more for free and I would have the same framerate". Instead try and batch those small objects up. For static objects, you can virtually batch anything you want. Anything that is within say a 3 foot by 3 foot box, like a kitchen table w/ cups and silverware, you could easy package all those up together, because chances are if your drawing that table it is:
1. If I can see the kitchen table, 98% of the frames I draw when walking by the table, I will see everything on it. Sure I might hit some frames where part of the table isn't seen, but F it, I'll draw the plates on that side anyway, cuz I'm much more efficient.
2. Not needing LOD, you never can get further than 10 feet away (next room) and see it, so LOD0 is all we need
What they are basically computing is if you batch absolutely nothing, which means you use 1 draw command for each single object, and your Lag time is the same time it costs to draw 600 triangles: then by the time Lag time from 1 draw command to the second, a GPU will be able to draw 600 tris. So why not just let it always work and between each Lag, just draw at a minimum of 600 tris.
I very much agree that every particular project deserves it's own tests it's particular guidelines on what's efficient, but I don't understand why it seems so phantasmagorical, that due to huge GPU speed and Slower CPU speed there's a little "safe time' by which GPU outruns CPU. And that "safe time" could be equal to rendering 500 tris, considering the engine does 3 million in 1/60th of a second.
BUT, before going on, remember that pixel/vertex shaders share the stream processors, so the more advanced shader, the less triangles it will be able to push out because the vertex/pixel shader push/pull the work relative to the other.
Yeah it totally has to be pointed out that if you're Vertex bound - save tris by any means necessary. If you have a lot of complex rigs or pixel shaders, extensive dynamic vertex lighting then definitely cut down those tris.
But most of game engines nowadays are fill-rate driven so it's the shader optimization that gives you your biggest bang for your buck.
and take the same time to render if we apply the logics form the NVIDIA slide from GDC that we all agree is correct. And that is what we're talking about.
Took us 3 pages. I'm sorry if I wasn't exactly on par with technical terms or wasn't clear enough, but could we please now all agree that it if you can generally say that a 600-tri object comes for "free", then it generally makes no sense to LoD it?
If you are batching certain objects though, then your combining them collectively to be more than 600, so if you have objects less than 600 tris, they are saying, you should find a way to batch them, because, your basically drawing a couple low poly objects for free. Because one is 200 tris, one is 400 tris, if you draw them by themselves, you are faster to draw them both at once. Effectively saying you get it for "free". So do you want to batch objects and make it super fast, or increase the polys and still draw them 1 by 1. If you increase polys, your framerate will be the same, if you batch those though, you will get a better framerate because your drawing the same # of tris, with just less draw calls.
This is a very good point. Also worth noting if all those objects combined are more then 600 tris and they are still batched you still save a lot of time by eliminating a drawcall.
Also artist's generally do not have direct control over batching, so unless specifically instructed it seems logical to treat every single object as unbatched and at least not bother with LoD on a low tri count object.
Also artist's generally do not have direct control over batching, so unless specifically instructed it seems logical to treat every single object as unbatched and at least not bother with LoD on a low tri count object.
Thats why I said that in games, batching is not as popular. Its more for trees/grass/particles/terrain.
Also 2005 again, thats 6 years ago. The bus between cpu/gpu is faster, gpu's can draw more triangles, but I honestly have no idea what the gap is of triangles to draw call. It may be more triangles or less. My guess is probably less due to the fact that again the GPU is also running shaders. Some GPU's have 32 processor, some 320, and if you spend 500 bucks u can get 512. A single draw call might mean you only lose 300 triangles if it lags. So if effectively you can render 5 million, and you lose 300 for each draw call, you can still get away with maybe 3 or 4 million tris so even unoptimized, your still going to do a lot, so the LOD questions and optimizations might not really be seen in regards to "this guy did this/that". Or why did we stop doing LOD. If you start cutting out tris and optimizing, where are you going to add the extra time, you cant go in and re-model your tank and just expect to plop in 10,000 verts on it now that you can afford it.
Also 2005 again, thats 6 years ago. The bus between cpu/gpu is faster, gpu's can draw more triangles, but I honestly have no idea what the gap is of triangles to draw call. It may be more triangles or less.
I'd have to refer to the NVIDIA presentation once again, so this is coming from a hardware manufacturer.
My guess is probably less due to the fact that again the GPU is also running shaders.
In the Unreal Tournament 3/Epic case we were discussing a fillrate-driven pipeline. So pixel shaders were pretty complex back then. And I don't see them become dramatically more complicated. If something we've learned to use them more efficiently and cut corners here and there.
Also on a bigger scale, in terms of a real game scene of 4 000 000 triangles does saving 20 000 tris by lodding 100 "benches" to save 200 tris each really seem like it is worth the effort? It's 0.5% of the overall tricount and this is even if we neglect the fact the you could very much be CPU bound for at least 600 tris per object, if not trice as more. 2-3 mil per level didn't help much unreal back then I hardly see them really saving someone now.
Also the consoles didn't change much since 2006 so they could be considered a pretty safe base point for "free" threshold calculation.
The more I think about it, the sphere-test isn't actually all that bad, the poles are still generally within a reasonable density limit. But here is a much better example, and likely more realistic testcase anyway.
A long cylinder! With traditional modeling, we would get a model that looks like the low on the left, but for displacement, these long thing triangles are going to be very poor, so we would need something closer to the model on the right.
I tested your meshes in Marmoset using the tessellation that was recently implemented.
The sphere tests were not that different, even though the quad sphere subdivided much more evenly.
As expected, the cylinders were like night a day. The quad cylinder gave vastly superior results.
Side
Poles
Side
Poles
Both tests had a Displacement Bias of 0.5, a Displacement Scale of 0.05 and a Tessellation amount of 512 (max)
*edit* In regards to optimisation pros and cons, I think it's relatively dependant on the target hardware. For something like a high end PC, saving 2 or 3fps might not be worth the effort, but on a console that could be 10% of your FPS, which makes the time spent vs performance gain a much more attractive option.
why? because a pixel shader executes once per fragment instead of per pixel. but a fragment is just a pixel! or alternatively what the hell is a fragment?!
Er, fragments are used because what the GPU renders might be going into an offscreen buffer, like reflection maps, shadow buffers, textures, and so on, so there are no pixels in there, strictly speaking. The result might not even be treated as an image but as a 2D array of some kind of data.
As for the inefficiency, every GPU today has groups of four fragment rendering pipelines that share some circuits. This means that rendering a triangle will always use at least 4 pipelines. Now if your triangles are smaller than 1 fragment then you'll waste 75% of your GPU rendering capacity.
This is why it still makes sense to use LOD because the further an object gets from the camera the smaller its triangles become.
If your 500 polygon object gets backface culled to 250, and then it takes up less than ~33x33 pixels on the screen, then you're starting to waste GPU power. And if it gets smaller than ~15x15 pixels, you should consider using sprites or other kinds of impostors on top of the LOD system.
All this is especially true for consoles where most games use a resolution of 1280x720 or usually even less (COD5 on PS3 is 960 x 540)
Oh and draw calls work very differently on consoles, too, but I'm not a programmer so I can't tell you more than that. There's less API overhead and CPU involvement.
Replies
Plus, you're not really gaining LODS down, just up, if you have to start with a model that has a relatively high level of detail.
Anyway I'm not trying to present a TESSELLATION SUCKS or TESSELLATION RULES argument, just learn more about it, as I'm genuinely interested in the possibilities. There certainly seem like a lot of negatives, but I'm sure there is some smart stuff we can come up with to get around them as well.
Yeah in the more long term this is where we're going to end up, with something that scales better than raw polygons. Whether that is displacement, actual sub-ds, voxels, a combination of all or something else entirely, we'll see.
so what you saying that when using batched rendering you don't actually batch the majority of your scene?Is this how this works?
also how else do you explain that stripping 2 millions triangles improved productivity by 2-3 fps?:
http://www.polycount.com/forum/showpost.php?p=762412&postcount=5
So 20,000*32 players total = 1 batch draw call = 640,000
LOD: 20,000*1 = (20,000) + 10,000*3players close by = (30,000) + 2,000*28players far away = (56000) = 106,000 verts. 1/2 a million tris is no joke to render, would be a HUGE waste for saving 32 VERY VERY VERY negligible draw calls. And again this is the worst case for the game where you see all the players (which almost never happens). With 1 batch, YOU ARE ALWAYS DRAWING EVERYTHING. With the draw 1 by 1 model with LOD, you only draw the players you can see. So that 106,000 would chop in 1/2 or more.
Where batching works: 100,000 blades of grass surrounding me. I might be able to realized I can only see 1/4 of the them (25,000), but I don't want to draw 25,000 pieces of grass at 2 triangles each. Instead I'll batch them into chunks of the magic number 130 triangles (or more) and draw 75 blades at a time. Even if some chunks I only see 55 blades because there are some off screen in this chunk. In this case 25,000*LAG adds up. That small draw call *25,000 makes my LAG time huge.
The 2-3FPS issue comes down to. It was not 1 object he stripped. Over the course of a whole level he cut 2 million triangles. So if your level is 2 million, at any point on the level say you are drawing 1/50th of the level. Well for each single frame of render then, 1/50th of the 2 million would mean in one frame he only cut the vertex draw by 40,000. Had he actually had a single object like a statue that was 5 million tris and cut it down by 2 million, then his FPS (while staring at the object) would go WAY up (by like 100). While hes looking away from the statue though, it wont matter if he cut it because 5 million or 3 million, the CPU would have never told the GPU to draw it cuz we arent currently looking at it.
He also does not mention the FPS, the common misconception with a lot of new developers in gfx:
A = a drop in 500 FPS, man my game must suck right? well I lost .001 seconds of computation.
1000 FPS = .001 seconds to render 1 frame
500 FPS = .002 seconds to render 1 frame
B = a drop in 2 fps, man thats not bad at all right? well I lost .002 seconds of time
30 FPS = .033 seconds to render 1 frame
28 FPS = .035 seconds to render 1 frame
So a drop in FPS is relative to how big the FPS was in the first place. In model A, it dropped 500 but really that was less TIME than it was in model B to lose 2 frames per second. But really its simply the fact in my first sentence.
Also that almost every single asset in UT3 levels bundled with UDK does not have LoDs whatsoever, and yet they still manage to maintain productivity that no one's complained about and were pretty much a graphic benchmark back in the day?
Time
>
CPU:[1st draw commmand][2nd draw command][3rd draw command]
GPU:IDLE
[100tri object]IDLE
[draw 3rd object]
It draws the second object, 100tris, and finishes before it gets the next draw command (3rd) to get to the GPU.
Basically if you draw a 200 triangle object by itself without batching, then your GPU is going idle because it will be idle long enough that it could have just drawn 400 more triangles in the Lag time it needed for a new command. So you could just make everything no less than 600 tris. BUT, before going on, remember that pixel/vertex shaders share the stream processors, so the more advanced shader, the less triangles it will be able to push out because the vertex/pixel shader push/pull the work relative to the other.
If you are batching certain objects though, then your combining them collectively to be more than 600, so if you have objects less than 600 tris, they are saying, you should find a way to batch them, because, your basically drawing a couple low poly objects for free. Because one is 200 tris, one is 400 tris, if you draw them by themselves, you are faster to draw them both at once. Effectively saying you get it for "free". So do you want to batch objects and make it super fast, or increase the polys and still draw them 1 by 1. If you increase polys, your framerate will be the same, if you batch those though, you will get a better framerate because your drawing the same # of tris, with just less draw calls.
Well 1, its free. 2 What is the polycount of those objects anyway. As you said anything less than 1,000 tris is overkill, once u are close to something like that who cares. 3 Optimize when you need to. My engine is SO damn unoptimized, but I dont care because until I want to add in some extra effects or something, I can still draw a shit ton these days because the gfx is so fast. Take a snapshot of any FPS game and you will see that there really aint much stuff being drawn. If in say crysis, I walk from a jungle into this shack with a coffee cup, it makes no sense to have LOD for the coffee cup, the only time I see it is close up. If the cup is on a dirt road in a jungle 200 feet away, then maybe I will LOD it.
Without any GPU optimizations at all, you can still do A LOT, its not like this Lag I'm talking about is 1/2 second to do a draw command. I mean those commands get there very fast and the gpu draws stuff pretty fast. High end cards with 320 processors eat 320 verts at a time.
It does cost money to re-topo and UV a smaller LOD as well, you add that by however many objects, and you have to look at all things, current LOD tri count, the coffee cup idea, can you batch items together, this object may be drawn 2x for a shadow map, will any optimization actually improve anything drastically to pay for a guy 50,000 to make the uber low poly LOD's for a year?
1. If I can see the kitchen table, 98% of the frames I draw when walking by the table, I will see everything on it. Sure I might hit some frames where part of the table isn't seen, but F it, I'll draw the plates on that side anyway, cuz I'm much more efficient.
2. Not needing LOD, you never can get further than 10 feet away (next room) and see it, so LOD0 is all we need
also this
and this from the first page
Took us 3 pages. I'm sorry if I wasn't exactly on par with technical terms or wasn't clear enough, but could we please now all agree that it if you can generally say that a 600-tri object comes for "free", then it generally makes no sense to LoD it?
Also:
This is a very good point. Also worth noting if all those objects combined are more then 600 tris and they are still batched you still save a lot of time by eliminating a drawcall.
Also artist's generally do not have direct control over batching, so unless specifically instructed it seems logical to treat every single object as unbatched and at least not bother with LoD on a low tri count object.
Also 2005 again, thats 6 years ago. The bus between cpu/gpu is faster, gpu's can draw more triangles, but I honestly have no idea what the gap is of triangles to draw call. It may be more triangles or less. My guess is probably less due to the fact that again the GPU is also running shaders. Some GPU's have 32 processor, some 320, and if you spend 500 bucks u can get 512. A single draw call might mean you only lose 300 triangles if it lags. So if effectively you can render 5 million, and you lose 300 for each draw call, you can still get away with maybe 3 or 4 million tris so even unoptimized, your still going to do a lot, so the LOD questions and optimizations might not really be seen in regards to "this guy did this/that". Or why did we stop doing LOD. If you start cutting out tris and optimizing, where are you going to add the extra time, you cant go in and re-model your tank and just expect to plop in 10,000 verts on it now that you can afford it.
Yes, I'm sorry I wasn't clear enough with the term and considered single object per batch still batching. I appreciate you educating me.:)
I'd have to refer to the NVIDIA presentation once again, so this is coming from a hardware manufacturer.
In the Unreal Tournament 3/Epic case we were discussing a fillrate-driven pipeline. So pixel shaders were pretty complex back then. And I don't see them become dramatically more complicated. If something we've learned to use them more efficiently and cut corners here and there.
Also on a bigger scale, in terms of a real game scene of 4 000 000 triangles does saving 20 000 tris by lodding 100 "benches" to save 200 tris each really seem like it is worth the effort? It's 0.5% of the overall tricount and this is even if we neglect the fact the you could very much be CPU bound for at least 600 tris per object, if not trice as more. 2-3 mil per level didn't help much unreal back then I hardly see them really saving someone now.
Also the consoles didn't change much since 2006 so they could be considered a pretty safe base point for "free" threshold calculation.
I tested your meshes in Marmoset using the tessellation that was recently implemented.
The sphere tests were not that different, even though the quad sphere subdivided much more evenly.
As expected, the cylinders were like night a day. The quad cylinder gave vastly superior results.
Side
Poles
Side
Poles
Both tests had a Displacement Bias of 0.5, a Displacement Scale of 0.05 and a Tessellation amount of 512 (max)
*edit* In regards to optimisation pros and cons, I think it's relatively dependant on the target hardware. For something like a high end PC, saving 2 or 3fps might not be worth the effort, but on a console that could be 10% of your FPS, which makes the time spent vs performance gain a much more attractive option.
Er, fragments are used because what the GPU renders might be going into an offscreen buffer, like reflection maps, shadow buffers, textures, and so on, so there are no pixels in there, strictly speaking. The result might not even be treated as an image but as a 2D array of some kind of data.
As for the inefficiency, every GPU today has groups of four fragment rendering pipelines that share some circuits. This means that rendering a triangle will always use at least 4 pipelines. Now if your triangles are smaller than 1 fragment then you'll waste 75% of your GPU rendering capacity.
This is why it still makes sense to use LOD because the further an object gets from the camera the smaller its triangles become.
If your 500 polygon object gets backface culled to 250, and then it takes up less than ~33x33 pixels on the screen, then you're starting to waste GPU power. And if it gets smaller than ~15x15 pixels, you should consider using sprites or other kinds of impostors on top of the LOD system.
All this is especially true for consoles where most games use a resolution of 1280x720 or usually even less (COD5 on PS3 is 960 x 540)
Oh and draw calls work very differently on consoles, too, but I'm not a programmer so I can't tell you more than that. There's less API overhead and CPU involvement.