Home Technical Talk

LoDs: From how many tris does it make sense?

1
polycounter lvl 14
Offline / Send Message
d1ver polycounter lvl 14
Mkay, so I've just watched Helder's vid on exporting stuff to CE3 and It got me thinking again on a subject I've let go for a while.

The backstory:
A couple of years ago I made a paper summarizing all the technical knowledge I managed to gather(most of which came from polycount and especially the "Too Much Optimization Thread"). I remember that It was very long-winded and I apologize for that once again. But here's an excerpt from that found no objections as far as I remember.
Exactly the way your engine draws your object triangle by triangle, it draws the whole scene object by object. In order for your object to be rendered – a draw call must be sent. Since hardware is created by humans it’s pretty much bureaucrtical.) You can’t just go ahead and render everything you want. First you’ve got to have some preparation done. CPU(central processing unit) and GPU(graphics processing unit) share the duties somewhat like this: While GPU goes ahead and just renders stuff, CPU gathers information and prepares next batches to be sent to GPU. What’s important for us here, is that, if CPU is unable to supply GPU with the next batch by the time it’s finished with the current, the GPU has nothing to do. From this we can conclude that rendering an object with a small amount of tris isn’t all that efficient. You’ll spend more time preparing for the render, then on the render itself and waste the precious milliseconds your graphics card could be crunching some sweet stuff.

BatchBatchBatch.jpg
A frame from NVidias 2005(?) GDC presentation

The number of tris GPU can render until the next batch is ready to be submitted significantly varies, but here are some examples. For UE3, UDN says, between a 1000 and 2000 triangles. While working with BigWorld engine we’ve set the barrier at 800 even though some of the programmers said that it could be around a 1000.

So the thing is - making objects objects that are unreasonably low poly is inefficient and technically unbeneficial.

From personal Experience:
At the studio I used to work at I actually approached programmers with that stuff and said that if this is correct, then why do we do LoDs for objects that are around 500-700 hundred tris? And they said that I was totally correct and sworn on the first episode of stars wars to our graphics director. And then the whole studio dropped doing LoDs for a lot of stuff saving tonns of time and money on LoDing and Rigging and as far as I know the project is alive and well and they didn't have to go back and redo stuff. Though I could be wrong since I left about 6 month after that. The project is World of Tanks, btw, so you know it's not a little indie game that doesn't need to worry about resources.

So after mentioning this in the paper someone said that it's incorrect since overdraw still exists. Ok, I mean I realize that that if a pixel on screen is represented by multiple triangles then every triangle will be rendered and sampled together for this one pixel. But, if in this particular drawcall for this particular object rendering All it's triangles still isn't longer then the time until the next drawcall is submitted then what the fuck do we care it all goes sampled into one pixel?
Or am I getting this thing completely wrong? 'Cause I was pretty much staggered by the try count of the prop in the video.

TLDR;
Does Cry Engine really care if the bench is 202 tris, 102, 502 or 1002? And does it really makes sense to LoD an asset of 200 tris?
Also,If some of you, guys, have questionable programmers in your studious it would be really nice to hear some more opinions "from the other side". Thanks in advance.


and just in case here's the original paper - pretty old by now,
polycount thread
and the Too Much Optimization Thread

Replies

  • Stromberg90
    Offline / Send Message
    Stromberg90 polycounter lvl 11
    I don't have any tech knowledge of this, but I think in the video it's more of showing the workflow of getting something into cryengine with LODs and everything else you might need.

    So I would not take it that seriously, there might be other things in the video that could be changed to improve my performance.

    However this is not me saying that you are not right, this was just my thoughts on the video.
  • EarthQuake
    With modern hardware, not only does it not make sense to make an LOD under 500 or so tris, it doesn't generally make sense to OPTIMIZE lower that 500 tris. The draw call for the object is where your performance bottleneck is at that point.

    Big thread: http://www.polycount.com/forum/showthread.php?t=50588&page=4

    So think about that, the next time you optimize that crate or barrel down to 120 tris, making it look needlessly jagged and lowpoly.
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    Thanks for the reassurement, EarthQuake!
    Stromberg90, I totally get you man, but the text below the video boldly states:
    All the assets featured within this video are included in the CryENGINE 3 Free SDK, version 1.0 [2456]
    which basically means that that's the way Crytek works and I got really spooked that I might've got this thing wrong for a long time now.
  • Stromberg90
    Offline / Send Message
    Stromberg90 polycounter lvl 11
    d1ver: I forgot about that, I did read it myself when I watched that tutorial.
    So it might be the way there work, and it's easy to get uncertain about such things when a big company like crytek does it.
    I think it might have to do with the way of thinking, that less is always better, and even tough EQ has said in his post that below 500 tris does not make sense, I will find myself making models with less than 500 tris.

    Thinking can be your worst enemy :)
  • [HP]
    Offline / Send Message
    [HP] polycounter lvl 17
    Well, it really depends, this is a very "dynamic" subject, as in, there's no set on stone rules for this, there's only you as an artist and your common sense.

    Because it really depends on the silhouette of your object, even if your object is only 300/400 tris, but there's a lot of geometry that you could clean up and get rid of, and the player will never notice from a distance and cut the tri count in half, then why not?

    Also, it depends on the engine you are working with, in CryEngine, it's recommended that your assets have at least one LOD, it really helps the streaming system doing a better job; like when you enter a new area, instead of having to load a 500kb asset, it will load smaller chunks first and then load the bigger ones (LOD2, LOD1 and LOD0)

    Hope that makes any sense.


    Ps. one of the guidelines that we use tho, is, if you're going to be creating LODs for your object, then each one must be at the very least 50% of the tri-count of the previous LOD; it's an all around good rule of thumb.

    PS2. And yes, that's an actual asset from Crysis2, and since it's a "generic" prop, the artist decided to create 2 LOD's, because he didn't knew whether or not it was gonna be used on a simple scene or a complex scene, so he decided to play safe and create 2 LOD's nevertheless.
  • Stromberg90
    Offline / Send Message
    Stromberg90 polycounter lvl 11
    [HP] wrote: »
    Because it really depends on the silhouette of your object, even if your object is only 300/400 tris, but there's a lot of geometry that you could clean up and get rid of that the player will never notice from a distance and cut the tri count in half, then why not?

    Well isn't the thing here that when a object has such a low tri count, that it puts more strain on the engine to change the LOD than just using the original object?
  • [HP]
    Offline / Send Message
    [HP] polycounter lvl 17
    Yes and no; it's all a matter of balance.

    Imagine the streaming system like a pipe, (buffer) this pipe can only stream a certain amount of information per frame, when you are loading a new scene, and the majority of your assets have LOD's, then the LOD's will be very small in size compared to the main asset (less tris and less materials, because yes, remenber that you can cut out materials with LOD's as well), therefor you can start stream in bigger quantities of assets per frame, it will start streaming the smallest LOD to the biggest in inverse order, LOD3 -> LOD2 -> LOD1 -> LOD0.
    If you have no LODs, you risk seeing your asset poping in out of nowhere because that pipe is very small and takes longer to stream in big file sizes, so remember a level is consisted out of hundreds of assets, not only your own, you need to think out of the box.
  • Stromberg90
    Offline / Send Message
    Stromberg90 polycounter lvl 11
    Thanks. For giving such clear info for someone like me, that has no real knowledge of this :)
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    Hey, man, thank you very much for the reply
    [HP] wrote: »
    Well, it really depends, this is a very "dynamic" subject, as in, there's no set on stone rules for this, there's only you as an artist and your common sense.

    Because it really depends on the silhouette of your object, even if your object is only 300/400 tris, but there's a lot of geometry that you could clean up and get rid of, and the player will never notice from a distance and cut the tri count in half, then why not?

    My retort would be, why spend your time and company's money on something that has no technical relevance and something, as you said, player would never notice?
    Also it's a bit simpler when you're talking about a bench, but I bet you guys have some military vehicles in your games where there are a lot of objects below a 1k tris skinned to a pretty complex rig. Making and skinning 3 LoDs of those and then fixing when someone request alterations could be a real pain.
    [HP] wrote: »
    Also, it depends on the engine you are working with, in CryEngine, it's recommended that your assets have at least one LOD, it really helps the streaming system doing a better job; like when you enter a new area, instead of having to load a 500kb asset, it will load smaller chunks first and then load the bigger ones (LOD2, LOD1 and LOD0)

    Hope that makes any sense.

    This is a totally viable argument and 500k is some amount of memory, but I'm pretty sure that the textures are the bottleneck here, since a difference between Mip0 and Mip1 could be ten times that value at times.
    That's really visible in UE btw, how they stream their mips.
    I also have to second what Stromberg90 said, it actually does put more strain as far as I know.
    [HP] wrote: »
    Imagine the streaming system like a pipe, (buffer) this pipe can only stream a certain amount of information per frame, when you are loading a new scene, and the majority of your assets have LOD's, then the LOD's will be very small in size compared to the main asset (less tris and less materials, because yes, remenber that you can cut out materials with LOD's as well), therefor you can start stream in bigger quantities of assets per frame, it will start streaming the smallest LOD to the biggest in inverse order, LOD3 -> LOD2 -> LOD1 -> LOD0.
    I'm really curious if material LoDs have to be attached to particular meshes and if they aren't able to work dynamically, like dropping the spec and normal contribution based on a distance parameter?
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    Hey and I decided to do a little test in terms of how much raw mesh data could weight, so we would have a more informed discussion.

    Exported to obj. stuff is as follows:
    2028 tris - 94.9kb
    972 tris - 44.3kb
    432 tris - 19.7kb
    192 tris - 8.91kb

    Exported to .ase(the one you used fir UDK before FBX support):
    2028 tris - 263kb
    972 tris - 126kb
    432 tris - 56.7kb
    192 tris - 25.9kb

    Under 1k tris the difference in worst(.ase) case is 70k which seems to be much less then a difference between the biggest texture mips, so I'm uncertain of how much of an footprint they would leave.
    Once again there's no argument against LoD but lodding stuff that's below a thousand tris could be irrational and has a lot of bonuses if you don't. That's the only argument I'm trying to present here.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    As a graphics programmer AND (almost) pro artist:

    The question of ALL those triangles mapping to 1 pixel is incorrect. LOD is not for overdraw, it is for putting less work on the GPU. Since further away shapes are harder to make out in real life, by observation they realized a less polygon model is not noticeable at far distances. Your overdraw wont happen for a few reasons: 1 there is generally fog, 2 if no fog, they might not even draw a model when its so far away that it maps to 5x5 pixels. 3 there is something now called "imposters".

    Imposters: Coolest shit to me. Imagine you are looking at a building that is 20 houses down on your street and its a dead end street that you look at it face on. You step forward 1 step. The building draws exactly the same and only grows by an extra 1 pixel border on each side. So they generate snapshots 360 degrees around that building at startup, and at large distances, based on the viewing angle, it will take one of those snapshots and put it on a quad that always faces the player. As you move the quad stays facing you but the shader determines a new angle to the player and uses one of the other snapshots that was taken. So why take the time to render and shade it when its going to look the same the very next frame.
    I'm really curious if material LoDs have to be attached to particular meshes and if they aren't able to work dynamically, like dropping the spec and normal contribution based on a distance parameter?
    For this, 2 things. One everyone is usually using deferred rendering, so in that case they will still use a per-pixel normal, whether it comes from the model or from a normal map. Secondly, at far distances, your model is so small that normal mapping 20x20 pixels is not a very costly thing. Again in deferred rendering, every pixel on the screen is normal mapped anyway.
    instead of having to load a 500kb asset, it will load smaller chunks first and then load the bigger ones (LOD2, LOD1 and LOD0)
    This is actually partly false. It depends really on the setup. Having to load in a small one to know that you are going to load in the next best one in roughly 5 to 30 seconds kind of sucks. You are going to do that for so many models and have to destroy/create them on the GPU (fragmenting it) and possibly hit the hard drive.
    Well isn't the thing here that when a object has such a low tri count, that it puts more strain on the engine to change the LOD than just using the original object?
    Not if its already on the graphics card, then there is 0 difference.


    One thing you guys might not understand as well though, is shadows take time as well. These objects you speak of could be rendered at least 1 time for player and 1 time for a shadow map. So multiply those vertex numbers you have by 2.
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    Hey, dpadam, thanks for joining the discussion.
    dpadam450 wrote: »
    The question of ALL those triangles mapping to 1 pixel is incorrect. LOD is not for overdraw, it is for putting less work on the GPU.
    Thanks for clearing out the overdraw part. And isn't it the point of this whole discussion that GPU doesn't really care if it's 300tri or 800tri because of CPU Drawcall submission time? So that's basically when LoD doesn't really do much good.
    dpadam450 wrote: »
    Not if its already on the graphics card, then there is 0 difference.
    So you basically have to hold lod0 + lod1 in memory which is actually a memory loss, provided lod0 render time is less then CPU drawcall submission time.
    dpadam450 wrote: »
    One thing you guys might not understand as well though, is shadows take time as well. These objects you speak of could be rendered at least 1 time for player and 1 time for a shadow map. So multiply those vertex numbers you have by 2.

    I suppose the shadow pass is a separate one, so do we really have to multiply the number by two? Doesn't the rules of single drawcall apply to the shadowpass? If so then there also shouldn't be much difference between 300 or 900 tris for an object.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    Thanks for clearing out the overdraw part. And isn't it the point of this whole discussion that GPU doesn't really care if it's 300tri or 800tri because of CPU Drawcall submission time? So that's basically when LoD doesn't really do much good.
    No because either way you do it, you store a model on the GFX card and say "draw this model with these textures", with an LOD model your still saying "draw this model with these textures". Both doing the same cpu to gpu talk.
    So you basically have to hold lod0 + lod1 in memory which is actually a memory loss, provided lod0 render time is less then CPU drawcall submission time.
    Drawing nothing is always faster than drawing something. A diagram:
    A = lag time from cpu to gpu on the "Draw model command"
    B = draw time gpu takes LOD0
    C = draw time gpu takes LOD1

    [A]
    [A]
    C wins.

    What you are specifically talking about is something like rendering grass:
    A = lag time from cpu to gpu on the "Draw model command"
    B = Draw time for 1 blade of grass
    C = Draw time for 1000 blades of grass put into 1 model

    Lets draw 1000 blades of grass:
    draw each balde as a quad one at a time:
    [A][A][A].............................................[A]

    drawn all at once combined as 1 model:
    [-A-]
    winner. You took the combined in the first thing we did, but you got rid of ALL those damn[A]'s. It still takes the GPU the same amount of drawing power on the GPU's part, but it didn't lag out waiting for communication from the cpu.
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    So you're saying that [A] and processes don't go simultaneously? While GPU renders CPU waits and is not busy gathering the next batch or preparing the next drawcall?

    In that case I would have to refer to the this pic:
    BatchBatchBatch.jpg
    and the links at the top.

    What we can conclude from there is that:

    [-A---] = lag time from cpu to gpu on the "Draw model command"
    [-B--] = draw time gpu takes LOD0
    [-C-] = draw time gpu takes LOD1

    While
    [-A---] we
    [-B--] or we
    [-C-]

    It's stil [-A---] length is it not?
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    I suppose the shadow pass is a separate one, so do we really have to multiply the number by two? Doesn't the rules of single drawcall apply to the shadowpass? If so then there also shouldn't be much difference between 300 or 900 tris for an object.

    Again who cares about draw calls, to draw any model whether it is 1,000 tris or 1,000,000 tris. If you have to draw a tank, no matter what you have to send the draw command and the texture. The amount of work to draw the high poly is still 100 times more.

    If you have only 1 model with no LOD and its 5,000 polys. You render 10,000 because it needs a shadow as well. You can use 2 draw commands and 10,000 polys, or 2 draw commands and draw LOD1 at 1,000 + 1,0000 = 2,000 polys, 5x faster, AND you can put more high poly models up close because you saved 8,000 polys to put on whatever is really close to you.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    Yea they do/can/could but the diagram again:

    A is draw command to GPU
    L = lag to go across the motherboard BUS on a wire
    b = time to draw 3D model 'b'

    I'n this diagram those 3 things are the same timeline. As the first line hits and A is coming in at line 2, but is not received till just after is done due to lag from the cpu/gpu BUS (based on bandwidth). Denoted as a *, at * the GPU is just idle. The command is coming, but it takes time. And sitting on that CPU/GPU bus is what everyone tries to stay away from. If we talk less, we can get more.

    A is executing right after itself, but finishes so damn fast, that because of [ L ], is done before the lag time for the message and is wasting time sitting idle.

    1[ A ][L]*
    2

    [ A ][L]*
    3

    [ A ][L]

    vs
    [ A ][L][bbb] (3 b's stored as one model
    Winner again.
  • equil
    caring about triangles is so 1995. I've found that using shader lods (switching to a simpler shader model at distances) gives a much more noticable performance improvement than using very low poly lods.

    the thing to keep in mind is that drawing small triangles is more expensive than drawing large ones (well, per surface area at least). why? because a pixel shader executes once per fragment instead of per pixel.
    but a fragment is just a pixel! or alternatively what the hell is a fragment?!

    i don't know, but apparently these guys do, because they made a diagram:
    quadfragmentmerging_diagram.png

    so yeah, higher triangle densities -> more fragments to shade. but! even the mesh on the right would be less expensive to draw if the model on the left had a shader that was 8 times as complex. considering the crazy normal map woodoo we do nowadays that's probably not impossible either.

    tl;dr: blame your programmer if things run slow.
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    dpadam450,
    1[-L-]******
    2

    [-L-]
    3
    [-L-]

    and take the same time to render if we apply the logics form the NVIDIA slide from GDC that we all agree is correct. And that is what we're talking about.

    Also I'm pretty sure, that this kind of logic is incorrect
    Again who cares about draw calls, to draw any model whether it is 1,000 tris or 1,000,000 tris. If you have to draw a tank, no matter what you have to send the draw command and the texture. The amount of work to draw the high poly is still 100 times more.
    and that there's no such linear relation between the number of polys and the render time. But rather all the way around - drawcall dependent relation. I've heard it personally from programming teams on big projects and there is a ton of third party evidence linked in the first post to confirm that.

    Hey, equil, thanks for the info. It was pretty much a commonly accepted knowledge around here that the shaders are the first thing to optimize, but it seems that since the "Too Much Optimization" thread moved to the wiki we kinda go back to the same discussion all over again.)
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    A fragment is simply a pixel that got written to, and a single pixel has multiple fragments hitting it. So you draw the fragments of the wall behind your computer screen first and those are run through the pixel shader and put as pixels, then you draw your PC screen and those fragments get turned into pixels you already wrote. So a fragment is basically just the pixel before it is actually written to the screen.
    even the mesh on the right would be less expensive to draw if the model on the left had a shader that was 8 times as complex.
    Thats correct. Pixel shaders and Vertex shaders share the same CPU. So if you take away from your pixel shader, you can do alot more calculations in the vertex shader.

    But this idea is beyond false:
    caring about triangles is so 1995
    We can do more, but if that was true, we wouldnt have LOD would we? And more recently in the past 3 or 4 years deferred rendering popped up and the amount of memory and processing speed to perform it is through the roof, which means they had to borrow it from keeping vertices low. Then SSAO came out too, which again takes a lot of pixel shader time, so keep the vertices low. Next game you play look at how low poly and cheap some of the stuff is.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    Also I'm pretty sure, that this kind of logic is incorrect
    Again who cares about draw calls, to draw any model whether it is 1,000 tris or 1,000,000 tris. If you have to draw a tank, no matter what you have to send the draw command and the texture. The amount of work to draw the high poly is still 100 times more.
    and that there's no such linear relation between the number of polys and the render time.
    We'll I'm just telling you how it works. And just think of this, any cpu whether its your core or on the gpu, it has a benchmark called Ghz, You can only perform your Ghz/30.0 operations a frame if you want to hit 30 FPS. Are you suggesting that for that drawing triangles is not linear? That each frame the first portion of triangles renders faster than the last half of the frame?

    What you posted about batching is irrelevant of LOD is all that I am talking about. You can draw 100 of the same animated character in 1 draw call, do you want those to be drawn 10,000 polys or 1,000 polys? The 10,000 is going to be much slower.

    The term batch is meant for similar objects to be drawn at once with as few draw calls as possible, which is why we use texture arrays and texture atlas', because if we have 10 models, we would have to bind 10 textures individually, if the textures are in an array, then we tell the GPU to bind all 10 at once with 1 call.

    There are a lot of designs/terms etc that well I cant just tell you how to write a gfx engine in this thread, but just on this thing your asking about is should we have LOD and how much: Yes and, as much as you want, need, or want to pay for. The only thing you lose in your case of characters and props (not trees or grass or instances that we would deal with that can be batched) is that you need an extra portion of memory for each LOD you have.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    One last thing, Yes I have read 99% of Nvidia's stuff and crytek and Killzone, and taht one you posted about 8x overdraw, that is not a paper to talk about LOD, they are showing something completely different and does not prove/disprove set any standard for LOD. But the last, most important thing, aside from what I said that extreme LOD is always the FASTEST, not necessarily best because of memory or streaming, is that the term batch does not mean a 3D model with 130 tris.

    They are specifically talking about my grass example, that if you put them all in one model effectively 1,000x the memory because it was 1 blade in the first example, that you are better off to draw them all at once, then 1 by 1. Typically you perform "culling" and decide what is actually in front of the player that will be seen by them if you actaully call draw on it. If its behind you and you call draw, still nothing draws. So you can try and figure out that of 1,000 grass blades, 8 of them are the only ones seen by the player that will project to your screen, they are saying drawing 8 batches of blades, one at a time, is SLOWER than drawing all 1,000 at once as 1 3D model, even if you do not see all blades at once.

    Hopefully that helps and makes sense.
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    Ok,dpadam450, here's the original paper which basically states that the most extreme LoD is not faster but you just have your GPU idle most of the time. I wish I dug it out before - could've saved us all a lot of time.
    Here's the link:
    http://origin-developer.nvidia.com/docs/IO/8230/BatchBatchBatch.ppt?q=docs/IO/8230/BatchBatchBatch.ppt
    - good stuff there

    And just to make a point as clear as possible, screens from the NVIDIA presentation:
    batchbatchbatch.jpg
    once again sorry I didn't bring this at first and saved us all some time.
  • Computron
    Offline / Send Message
    Computron polycounter lvl 7
    Whats the criteria for batching in something like UDK or Cry3? I thought all instances would still be one batch?
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    d1ver CLOSE! lol, but here is what you are actually seeing in the first graph. You are misinterpreting this as LOD vs what I am talking about and referred to ass "L" in my diagram, the slow communication between CPU and GPU.

    Sadly there is no reference for time, I assume this is how much it can do in 1 second of time. Assume that we use 10 as the base object, say a teapot (typical of nvidia).

    They are saying that a GPU has so much power. The most triangles it will ever draw is however many it can fit in its RAM and then just use 1 draw call. When they sold their gfx cards years ago they would market them by this, how many optimal triangles, no pixel shaders, and using only 1 model, could it draw in a second.

    So at left they are drawing each 10 triangle teapot and drawing it one by one as many times as possible in 1 second. Which means effectively because of L = lag from cpu/gpu talking to each other, they really didnt draw shit.

    On the far end of the graph have a batch of 1500 tris (10 teapots*150), so they put 150 teapots together and drew 150 at once. Because you reduced L, you are allowing the GPU less time to be idle to it can draw more triangles. Thats all that says.

    Time
    >
    1 + 1 + 1 + 1 + 1 + 1
    150 + 150 + 150 + 150 (drew a shitload more cuz I put 150 together).

    This does not IMPLY, please understand, that the actual processing power to draw 150 teapots was LESS than drawing 1. The GPU did actually do 150x more work. All you are seeing is that you arent letting it take very very tiny microsecond brakes after each job it does. If there was NO lag to talk to the GPU, just instantaneous, then those number would be the exact same. LOD is not solving L (lag), LOD is solving the idea that the GPU can only do so much in Ghz, so lets use most of our power on close objects, and less power on stuff far away because we wont see tiny details and creases etc. Batching, is about combining like objects and taking care of them all at once, to solve the issue that our GFX card is a super good worker, but he takes breaks, force him to always be working. That paper is not using or doing, or talking about LOD.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    As to your question about world of tanks, money vs performance. Here is what they probably did:




    Considering a GPU can draw a ton of triangles, LOD again will still be faster, but if its saving them .00001 seconds by using another LOD, unless they really want to push limits to add a ton more trees or something else, they just decided it wasnt worth the cost and that the game already looked as good as it needed to. If you can draw 3 million triangles in your scene, and you cut 300, your only saving micro/nano seconds.
  • metalliandy
    Offline / Send Message
    metalliandy interpolator
    Interesting thread :)
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    I'm not interpreting anything as a LoD here at all. What the graph says there is that GPU is Idle when rendering less then 130 tris per batch. If it calls a separate drawcall, then your single 100-tri object is a single batch. And then making an object of less then 130 tris technically makes little sense. So making a lod for this kind of object is downright irrational.

    And yes 20 10-tris objects are slower to render then a single 200 tris object. But 20 10-tris and 20 130-tris are practically the same - that's what it states.
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    And ok, now I'm sorry I didn't get out this slide right away 'cause apparently following links is too boring:)

    batchpersecond.jpg

    we increase tri count of our scene x200 and still render it in a single second.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    oh man, it's not even funny anymore.
    I guess, I'm trying to answer your questions, you asked them, I gave you technical information as a GFX programmer, and I have done the tests and I'm just trying to keep you educated to what I know and understand what that thing is saying.
    And yes 20 10-tris objects are slower to render then a single 200 tris object.
    Correct. You batched them together AND are drawing the same amount of vertices. So your render time is the same, you got rid of the Lag time.
    But 20 10-tris and 20 130-tris are practically the same - that's what it states.
    Practically -> yes. Because you said practically, we are on the same page.

    In your example you have 20 draw calls, so the performance difference (as small as it is) would be 20*(130-10) = 2,400 tris saved if you used 10 triangle LOD instead of 130. So yes drawing 2,400 verts is kind of negligable when you can draw 3 million, but again to me, hell no that not negligible.

    Thats all I'm trying to help you out with. If you have to draw 20 objects whether they are high poly or low, whether you batch the high polies into 1 object or batch the low polies into 1 object, you WILL save performance by drawing less triangles.

    Another place this really occurs is in terrain rendering since it is a grid of chunks. You can render only the 8 tiles you see, and make 8 draw calls, or you can put all of your 64 tiles into one 3D model and call render. What the diagram is saying is that if each tile is the same size ( say 100 verts), then by drawing them 8 1 at a time, I WILL cause LAG.

    So I shouldn't increase the tile chunks to 200 verts, it Is saying I should at least treat them as a group of tiles. So group 2 tiles together, and test if combined they can be seen by the player, draw them both if they are seen. If only 1 of those original chunks cant be seen, too bad, I batched them together and even though 1 of those chunks never draws to the screen, I still drew it because it wont be any slower, AND best case when they both are on the screen, it is FASTER. While drawing the first grouped tile (200 verts now since I grouped them), my next terrain tile will receive its draw command while the 130th triangle is drawing, and there will be no pause LAG in drawing the 2nd tile.

    Does that make more sense?
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    Hopefully some people do read this thread, because if any of you have seen DX11 tessellation, if you havent that youtube it, it will REPLACE the need for artists to make LOD models completely. AND it also allows for extremely high poly models as well. 2 bonuses.
  • EarthQuake
    Tessellation/displacement is far from a silver bullet. You have various content management and implementation issues, increased texture usage(again, which is really the killer) and all for what can be reasonably accomplished with just using a little more geometry in the first place.

    Tessellation is good for LODs yeah, but not really all that advantageous for a variety of other reasons.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    I haven't taken time to implement it and nobody is really writing papers on it, it will be years before we can say that all people need at least a DX11 card and replace LOD as I suggest.

    I have no real idea how well tessellation is going to be adopted and what more it will do. It can definitely do more than just a little more geometry, so on certain things it does look awesome.

    The biggest issue I don't like it is that it tessellates uniformly and you might have flat spots getting wasted tessellated, while the bumpy parts that pop out get the same exact tessellation. Well see, its way too early, but on certain objects though, its badass.
  • Jonathan
    IMO, here are some things to keep in mind:

    When/where/what and LOD depends on:

    Platform(s): Does the platform get stalled by texture fetches, or on that platform do stalled threads get swapped out to perform ALU operations (useful on 360).

    Shader Complexity: if we have to render extra pixel quads on the GPU because of those small triangles, how expensive was the original shader on the object?

    World Size: Are we drawing a huge scene, or a smaller scene, with more complex objects?

    Engine:Where are you currently limited in your engine? CPU, GPU, mixture, etc? "Hey, the framerate is bad, cut the art!" <----Horrible advice, you fix the problem, not the perceived problem.

    As a rather generalized rule-of-thumb you could say anything under 400 triangles that is not instanced a lot in a scene doesn't need an LOD, however, there are other considerations, like streaming for example.

    At the end of the day, do your own homework, and work with PIX, GPAD, or any other performance analysis tool to get correct numbers, and avoid forum chatter (though Polycount has a lot of great and useful information). :)

    Applying your experience with engine ______ to engine ______ can be very bad, as you often will find your original ideas were incorrect.
  • EarthQuake
    dpadam450 wrote: »
    I haven't taken time to implement it and nobody is really writing papers on it, it will be years before we can say that all people need at least a DX11 card and replace LOD as I suggest.

    I have no real idea how well tessellation is going to be adopted and what more it will do. It can definitely do more than just a little more geometry, so on certain things it does look awesome.

    The biggest issue I don't like it is that it tessellates uniformly and you might have flat spots getting wasted tessellated, while the bumpy parts that pop out get the same exact tessellation. Well see, its way too early, but on certain objects though, its badass.

    Here are my thoughts on the topic, I haven't implemented nor have the skills to do so either, but just from general experience and some light research on the topic:

    Its not going to be something you'll throw on every object, has some really great uses, but not a universal system. Works much better in some situations than in others. Here are some basic pros/cons as I understand them.

    Pros:
    A. Lods - obvious
    B. Physical mesh storage, obviously your meshes will be lighter and take up less disk space, arguable how much this actually matters. I'm sure there is some sort of RAM impact here but I cant say exactly what that is.
    C. Really good for lumpy stuff, organic stuff that doesn't need high precession surfaces.
    D. Great for dynamic effects, like fading from one texture to another to "animate" mesh distortion.
    E. Could be good for animation/rigging, less complex "cage" meshes mean easier rigs

    Cons:
    A. Getting a high level of detail requires a massive amount of tessellation. 1:1 pixel to poly DISP would be 1 million quads for a 1024x1024 texture. Many magnitudes higher than a 50% or 100% gain in triangles that would likely get you the end result(a detailed and smooth silhouette) that you're looking for.
    B. Getting a consistent amount of detail requires a very even mesh flow, with even sized square quads(similar to sculpting apps).
    C. Traditional polygon methods are going to be more accurate retaining detail, ie: if you have an overly simplified cage mesh, retaining what is "important" is going to be very difficult. With traditional modeling, you just model in what is important, silhouette wise.
    D. If your cage needs to be quite detailed to retain the important information, the perceived detail gain is going to be quite low, and chances are you can just use a little more geometry and a much less complex shader to get the same result.
    E. Generally poor for hard surface work, where precise planar and primate shapes and intersections are very important.
    F. Creating content is more difficult, you can't use hacks like "floating geometry" and get a clean displacement.
    G. Will not actually replace normal maps, you'll still need to use this in addition to NMs, which means more VRAM usage, possibly even a 16bit height-map for quality displacement.
    H. Synchronizing your source -> baked result with the in-game result, and having the right height scale while staying within the bit-depth of the image format is tricky.
    I. Possibly problematic with animation, too light of a cage mesh means your rigging control may not be fine enough.


    I'm sure there are more pros/cons, and I would love to hear more from the programmer side of it, as I'm sure there are things I'm misunderstanding or not accounting for.

    PS: I can move this into a dedicated tessellation thread as well, if its annoying anyone being here.

    Now specific to the LOD discussion, I think Tessellation sounds really great in theory, because you get: 4K, 16K, 64K, 256K type LODs for free, however if all you really needed was 8K instead of 4K in the first place, the massive tessellation thing is just overkill. So you get all these extra levels of LOD, but how many do you *really* need?
  • equil
    there's the uv seam hole issue too. anyway, thought i could link to this somewhat levelheaded mindset toward tesselation. the understandable stuff is on page 30-40. http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=23111
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    Yea, I'm against tessellation for now. It's great for some things like rocks because its phenomenal at that. Your cons a tad bit off AND correct in that:

    A. Texture size can be whatever you want, and you don't need to tessellate it down to a 1:1 scale at run time. Most pixels in the 1024 map won't be used. A 1024 map at best would come out to 1 megabyte, most likely though for precision you will need it to be 2 megabytes. These can't be compressed like DDS/DXT because of how those work.

    B. Compared to the memory you need for the displacement map, the saved memory from the high poly is not that much.

    E. Is actually false, because it maintains planar hard surfaces, when you bake your displacement map, it will be 0. So the vertex will be shifted by 0, and planar surfaces will remain planar, if your high and low poly planes are on top of each other.

    F. False. Your floating geometry is not going to bake down as a displacement on the other geometry. It would be treated as separate geometry just like a normal map does. You shouldn't trace floating geometry into the normal map of the geometry it is floating on. If I understand that floating geometry is say a safety on a gun.

    G. True. But its only natural that tech gets added on top of each other. It costs more power to render better looking stuff.

    H. I don't think this is an issue, even if again best case you can bake to a texture with just an R channel, that gives you 256 height values, that will work for some models, and even the ones where that is not enough, is it reasonable to be off by, how much? It would take judgement, a 16-bit though, sounds for sure like it would be all that is used and enough to re-create the exact high-poly model you started with.

    I. Not sure what the term cage mesh is meaning. But I really doubt anything would happen with animation, you have a low poly triangle, it gets transformed/rotated, everything is fine. You chop up that triangle and displace them, everything should still be fine. Youtube nvidia alien tessellation demo, its pretty sweet.

    You won't need a uniform mesh of same sized quads though, and you can still make the more important stuff more dense. As far as LOD, imposters are sweet, you can draw everything as a quad. I have yet to design my system for them and am going to push them as much as possible since I'm doing all the artwork as well (don't have time to do a bunch of LOD).
  • EarthQuake
    dpadam450 wrote: »
    Yea, I'm against tessellation for now. It's great for some things like rocks because its phenomenal at that. Your cons a tad bit off AND correct in that:

    A. Texture size can be whatever you want, and you don't need to tessellate it down to a 1:1 scale at run time. Most pixels in the 1024 map won't be used. A 1024 map at best would come out to 1 megabyte, most likely though for precision you will need it to be 2 megabytes. These can't be compressed like DDS/DXT because of how those work.

    Yeah, I would be curious to experiment with relatively low displacement maps, certainly that would work for some things. For other things like accurate hard surface work, you would need likely 1:1 ratio though.
    B. Compared to the memory you need for the displacement map, the saved memory from the high poly is not that much.
    Well, this assumes that you're comparing 1million displaced to 1million raw polies, which would never be the case, straight up displacement is always going to be much MUCH less efficient than a well constructed lowpoly. This is something that is correct on paper, but would never be the case in real use.
    E. Is actually false, because it maintains planar hard surfaces, when you bake your displacement map, it will be 0. So the vertex will be shifted by 0, and planar surfaces will remain planar, if your high and low poly planes are on top of each other.
    There are a variety of reason why displacement is poorly suited for accurate hard surface work, especially when we're looking at the way we do things currently. Again, evenly size square mesh-flow, a very high(1:1) level of tessellation to get crisp and accurate shapes. A lumpy rock is going to much easier to create content for, and get working ingame than a complex first person weapon, with displacement.
    F. False. Your floating geometry is not going to bake down as a displacement on the other geometry. It would be treated as separate geometry just like a normal map does. You shouldn't trace floating geometry into the normal map of the geometry it is floating on. If I understand that floating geometry is say a safety on a gun.
    No, floating geometry as in floated "fake" indents on top of a mesh, that works great with normal maps because you simply ray-trace and sample the direction of the normal, not the distance/depth. This is a huge time saver with hard surface work.

    Like so:
    floaters101.jpg
    G. True. But its only natural that tech gets added on top of each other. It costs more power to render better looking stuff.
    Right, this was thrown in here to address the common misconception that displacement would replace normal maps.
    H. I don't think this is an issue, even if again best case you can bake to a texture with just an R channel, that gives you 256 height values, that will work for some models, and even the ones where that is not enough, is it reasonable to be off by, how much? It would take judgement, a 16-bit though, sounds for sure like it would be all that is used and enough to re-create the exact high-poly model you started with.
    From the experiments I've done, syncing up a bake to display can be a major pain. Its sort of like the whole tangents issue with normal maps(and various apps using various tangent bias'), essentially this would come down to just scaling a multiplication factor, probably on a per-material basis or something, but its still an issue.

    I'm not saying this is a bit-depth problem, its a normalization problem.
    I. Not sure what the term cage mesh is meaning. But I really doubt anything would happen with animation, you have a low poly triangle, it gets transformed/rotated, everything is fine. You chop up that triangle and displace them, everything should still be fine. Youtube nvidia alien tessellation demo, its pretty sweet.
    Cage as in the sub-division or sculpting "cage" mesh, the lowpoly source mesh.

    I'm not saying there would be issues with animation(ie: it wouldn't work), but say, with traditional triangles you would have a 12K mesh, and your "cage" with a sub-divided workflow would be 4K, your weighting is going to be significantly less detailed in that case, which would effect the quality of animation. Again I'm not really sure if that would actually be a problem, but it potentially could be.
    You won't need a uniform mesh of same sized quads though, and you can still make the more important stuff more dense. As far as LOD, imposters are sweet, you can draw everything as a quad. I have yet to design my system for them and am going to push them as much as possible since I'm doing all the artwork as well (don't have time to do a bunch of LOD).
    Actually, you do, to the extent that you want an even texal density in... well pretty much all assets, aside from assets/portions of assets that require additional density. It would be much the same. This is a huge change from traditional modeling where you give detail to the areas that need it and not to those that dont, now you need even, grid-shaped polyflow to ensure quality displacement, much like you would create a mesh for a sculpting app.

    Essentially your texal density needs to be considered not only in regards to uvs, but geometry as well.

    A quick example:
    disp.jpg

    The regular sphere on the left would be a poor mesh for displacement, with its uneven and irregular poly distribution. The quad-ball on the right would be ideal. Certainly there would be exceptions to this "rule" but needing evenly sized, square quads would be what you would generally shoot for in a system predicated on tessellation + displacement.


    Personally I think even when we're all running systems that can handle tessellation on everything it still wont make sense to do it. It will make more sense to simply model a reasonable amount of detail for your screen resolution, have some sort of automated LOD system, and spend those extra resources on things like lighting, animation, physics, ai, etc etc. Really, the whole idea that we need to push a lot more triangles in games and thus the need for displacement is misplaced. That need just doesn't really exist. Something like realtime radiosity is going to have a far greater visual impact.

    Then of course, use it for the realtively small amount of assets where it can do really cool stuff, for dynamic effects and such.
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    I thought u might be right about the sphere being an issue but I just did a displacement map with both of those, and the shapes came out 98% identical.

    I tried to post an attachment but everytime I have tried on this forum, nothing attaches. And the file met the resolution and filesize req's.
  • motives
    Offline / Send Message
    motives polycounter lvl 18
    You people throwing slides around like its nobodys business should just listen to dpadam450 instead.

    That will probably make you more happy then finding yourself sitting at the end of a 3 year project and suddenly have to max LOd every single object in your game in the last 2 months becasue you are running your game at 24m/s a frame and cant figure out why... :)
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    motives wrote: »
    You people throwing slides around like its nobodys business should just listen to dpadam450 instead.

    That will probably make you more happy then finding yourself sitting at the end of a 3 year project and suddenly have to max LOd every single object in your game in the last 2 months becasue you are running your game at 24m/s a frame and cant figure out why... :)

    Hey, man I would totally love to as soon as I hear an answer to a simple "why?". There's no place for doing things "just 'cause" in a production environment where every second is a waste of money and opportunity, so there's nothing wrong with getting to the bottom of things.
    And once again what's wrong with throwing around slides of one of the biggest graphics hardware companies, who has an army of engineers working on making games look better, also working on those presentations as well to educate people how to make better games?
    'Cuz their sole purpose is to make games look their best so they sell more units.
    I consider this would make a viable argument and I would still love to hear why the last slide does not prove the point. And I also in no way want to disregard anything dpadam450 says, it just feels like we're talking apples and oranges here. I see one thing in the slide - he sees the other thing. Btw, dpadam450, sorry about the
    "not funny" thing - It's actually is all in good fun and I appreciate your effort to help educate us, but I can't believe things just 'cause you said so.
    I'd love to be proven wrong, because it's the only case I haven't wasted mine and everyone else's time with this. So, motives, if you have something to say on the subject - fire away! I would be very grateful. srsly:)
  • motives
    Offline / Send Message
    motives polycounter lvl 18
    I think dpadam is doing a better job explaining than i could ever do. Especially in english.

    My main point however is that everything counts.
    No matter what your programmer might seem to think, there is no free.

    one object with a sloppy lod or an unreasonable number of shaders on the last lod will probably slip under the radar but when you are cramming a 64 player map with vehicles, destruction, 500+ objects, persistance and all thinkable bells and whistles into a PS3 you most certainly don't want unlodded objects.

    So what i have to say about the subject is that nothing is free and that fact wont change no matter how many slides gets presented online :)

    This sums it up in a nice way i thought:

    "At the end of the day, do your own homework, and work with PIX, GPAD, or any other performance analysis tool to get correct numbers, and avoid forum chatter (though Polycount has a lot of great and useful information). "
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    Hey, Johan, I totally agree with you. Everything does count and in no way should you just drop LoDs. Make the best you can with what you've got and waste nothing. I fully support this approach and in no way am I advocating sloppy work. In fact I would be the one to optimize stuff even when it makes no difference to anyone.

    But aren't you afraid, that by strictly following the general logic you might not allow for a more elegant solution?
    and that fact wont change no matter how many slides gets presented online
    I think you realize that it's not about slides, but about information in them. If you're willing to disregard information, never mind if it's rational and objective, due to your previous beliefs, then you're on a dangerous track of missing out on useful innovation.

    I very much agree that every particular project deserves it's own tests it's particular guidelines on what's efficient, but I don't understand why it seems so phantasmagorical, that due to huge GPU speed and Slower CPU speed there's a little "safe time' by which GPU outruns CPU. And that "safe time" could be equal to rendering 500 tris, considering the engine does 3 million in 1/60th of a second.

    And that's what the last picture shows:
    you render ~60k batches of 10 tris per second and then changing only tris per batch
    you render ~60k batches of 200 tris still per second.
    Which basically means you render 600 000 tris and 12 000 000tris on the same machine and in the same amount of time.
    This is factual evidence that in the best case you could get 20 times(!) more stuff rendered for free. And this is not me who conducted the tests but NVidia. So if you're willing to disregard that kind of info, then yes we all could refer to the general "do what's better for your project" and "always do LoDs for everything" mindset and we definitely won't have a productive discussion here. But I'm just have to get to the bottom of this.
    Chances are that I misunderstood the presentation in some way and I most sincerely wish someone would call me out on it, until I have no counter argument left, so the truth will prevail. And in that case I'll be the first one to bang my head against the floor and beg your pardon.
  • motives
    Offline / Send Message
    motives polycounter lvl 18
    No, im sure the information is correct.
    It's just, you also have to account for min-spec PC and consoles so the fact that a reasonably good GFX card can push X amount of triangles in X amount of time wont help you when developing for multiple platforms.
  • EarthQuake
    dpadam450 wrote: »
    I thought u might be right about the sphere being an issue but I just did a displacement map with both of those, and the shapes came out 98% identical.

    I tried to post an attachment but everytime I have tried on this forum, nothing attaches. And the file met the resolution and filesize req's.

    I dont know what the issue would be, you could use something like dropbox though and just post links.

    Now, this really depends on the content of your displacement map. If all you're doing is baking a smooth sphere to a lowpoly sphere, you really shouldn't see any difference at all. But once you start adding any sort of detail, the geometry around the poles is going to mostly likely cause you some problems.

    I don't have a DX11 card here so I can't play with it much, but I have whipped up some quick example content. I've modified my standard sphere mesh here, to use a bit less geometry but be closer to the detail level of the quadball(its still a higher amount of tris though, because of the poles).

    So here is a rar with low obj, normal and height: http://dl.dropbox.com/u/499159/balltest01.rar

    balltest01.jpg
    balltest02.jpg

    I've got another idea for a test to stress precise pixel-level type hard surface work too, that I will probably get up a bit later.

    This also shows the problems with normalization, when baked the min/max height values needed to scaled to fit within the bit depth of the image, now you'll have to fiddle with your shader until you find the "correct" multiplication factor that matches the look of the high. This is a pretty large inconvenience, especially when you compare it to a basic normal map workflow, where the normal is simply an absolute direction value(or relative to the mesh's tangents, but you get the point).
  • Teemu
    With my limited knowledge I do think it matters most of the time especially if you're vertex bound. However if you are already bound by batch count then having LoDs is not going to help much at all.

    Lets say you have 400 draw calls and each call equals an individual 3d object. Let's also say that this equals 1000 000 tris drawn.
    Now if you had lods for those models you'd still have batch count of 400. That is because you still want to draw all those 400 objects meaning that your batch count doesn't go down. However because each object now has less vertices, your tri count could be for example 500 000 instead of the previous 1million. Less vertices helps if you're vertex bound so that would result in faster rendering.

    So you would end up with something like:
    without LoDs:
    400 batches, 1000 000 tris
    with LoDs:
    400 batches, 500 000 tris

    The point (atleast how I understand) the slideds is that you should combine objects into bigger, higher poly chunks in order to reduce batch count. It makes sense to combine a bookself and it's books instead of having all the books as separate objects. That will reduce batch count and everyone's happy.
    However if you change a 1000 vert model into a 100 vert model you're not saving a batch. You're just sending a smaller vert count object to the GPU to render but it's still being sent. Still the batch count is the same if you only changed the vert count.
    Lower vertex count generally reduces GPU render time unless you're bound by something else.

    But in practice I really would question the need to create 3 LoDs for a small object and very lowpoly object. I'd also consider work / benefit ratio instead of simply LoDing every single object in the game.
  • passerby
    Offline / Send Message
    passerby polycounter lvl 12
    seems people are just trying to optimize for the hell of it with out understanding what resource/performance budget there aiming for.
  • d1ver
    Offline / Send Message
    d1ver polycounter lvl 14
    motives wrote: »
    No, im sure the information is correct.
    It's just, you also have to account for min-spec PC and consoles so the fact that a reasonably good GFX card can push X amount of triangles in X amount of time wont help you when developing for multiple platforms.

    You are absolutely right that we have to account for minimum spec PC. And if you look at the diagrams this things were tested on as low specs as 1ghz CPU with GeForce 2 to GeForce FX5800 and Radeon equivalent. And they show a free 10-200 tri gap even there. Are your games even tested for comparability with such hardware nowadays?
    So I'm pretty sure that a secure bottom level could be found and 500-1000 tris doesn't seem too far fetched.

    Hey, Teemu, thanks for joining in.
    Yeah it totally has to be pointed out that if you're Vertex bound - save tris by any means necessary. If you have a lot of complex rigs or pixel shaders, extensive dynamic vertex lighting then definitely cut down those tris.
    But most of game engines nowadays are fill-rate driven so it's the shader optimization that gives you your biggest bang for your buck.
    And ,once again, the old quote from Kevin Johnstone that basically says how unimportant tri count was in UT3 optimization process. if you haven't, Read It Please:
    http://www.polycount.com/forum/showpost.php?p=762412&postcount=5
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    So if you're willing to disregard that kind of info
    It is just that I know what Nvidia is telling me based on that paper. It makes sense to me. The best example is the terrain idea I gave you because that is mainly where it is going to be used.

    In a game you cant just batch 10000 objects together. A lot of Nvidia's stuff is not really translated for games. Sometimes they give examples of a single character with cool stuff rendering at only 60 FPS. Again you can't batch so many objects together just because you want to. Assume you batched a whole house as 1 piece of geometry and 1 big texture array. Well your drawing the whole thing when you can only see one room which is basically saying your drawing 10x more stuff, to compensate for a tiny bit of Lag to send a few draw calls for just the stuff you seen in the room. 10x more work, to save only a few micro/nanoseconds to send maybe 20 draw calls for only the objects you see, is not the case to do this.
    in the best case you could get 20 times
    Yes best case in their demo they used all the same model. People always write tech papers like this and they will render all spheres or teapots. In games though, you usually go: grab teapot texture, draw teapot, grab wall texture, draw wall, grab the china(plates,cups) texture, draw batch of china(plates, cups) all at once <
    yes 1 batch. You cant always do this though.

    And you cant disregard rendering to shadows either. I know the common idea is "its not 1995 just draw shit" but GPU's are maxed out and they are definitely limited, stream processor means it can be a vertex or pixel shader, and switch at run time. So if you save by drawing less triangles, you get more pixel shader computation time.

    And compared to 2005, believe me, a draw call is not that slow anymore. Even unoptimized, at one point I had like 100K+ commands sent to the GPU and still drew 3-4 million polys at a playable framerate.
    needed to scaled to fit within the bit depth of the image, now you'll have to fiddle with your shader until you find the "correct" multiplication factor
    Well that number to scale by should be computed automatically when the program bakes. If it isn't then it should, or a very simple utility could just displace a single vertex a few times and check until it is the exact(very close) position of the high poly. But yea tessellation isn't much worth talking about, and when do you have a mine with all those alien extrusions. Just model the spikes as you said.
  • EarthQuake
    The more I think about it, the sphere-test isn't actually all that bad, the poles are still generally within a reasonable density limit. But here is a much better example, and likely more realistic testcase anyway.

    A long cylinder! With traditional modeling, we would get a model that looks like the low on the left, but for displacement, these long thing triangles are going to be very poor, so we would need something closer to the model on the right.

    http://dl.dropbox.com/u/499159/cyltest01.rar

    cyltest03.jpg
    cyltest_04.jpg
  • dpadam450
    Offline / Send Message
    dpadam450 polycounter lvl 12
    I have nothing to do at work since we are waiting to ship a game, so I got to kill time on this thread.
    It makes sense to combine a bookself and it's books instead of having all the books as separate objects. That will reduce batch count and everyone's happy.
    However if you change a 1000 vert model into a 100 vert model you're not saving a batch. You're just sending a smaller vert count object to the GPU to render but it's still being sent. Still the batch count is the same if you only changed the vert count.
    Lower vertex count generally reduces GPU render time unless you're bound by something else.

    But in practice I really would question the need to create 3 LoDs for a small object and very lowpoly object. I'd also consider work / benefit ratio instead of simply LoDing every single object in the game.

    Thats everything. If you sending 1 batch anyway, draw less triangles, because your high poly and low poly at a distance will look the same. Save the time, give it to your pixel shaders.
1
Sign In or Register to comment.