[Technical Talk] - FAQ: Game art optimisation (do polygon counts really matter?)

CrazyButcher · Nov 2007

after some heated discussion came up in the "what are you working on thread" about how anal one should be about shaving those tris down, I guess its better to move the topic here.

this was a presented model, its fairly low already, and has beveled edges in a fashion that allows UV mirroring, and gives nice smooth normals (therefore is good for baking and runtime interpolation).

the proposed version, while taking less tris, will have some issues on interpolation side of things. (paintover!!)

now the major issue was "less = better", which is not always right. Basically the way driver/api works, we speak about batches being the limiting factor in a single frame. That is drawcalls. The less the better, the trianglecount per batch doesnt really matter, and once certain thresholds makes no difference at all. now a very good paper explaining the sideeffects and phenomena is this paper by nvidia
http://developer.nvidia.com/docs/IO/8230/BatchBatchBatch.pdf
be aware that "high-end" card was a gf5 back then, and even the gf2 was saturated at 130 tris, ie makes no difference if less triangles per batch.a slightly newer version of that paper (wow radeon9600)
http://http.download.nvidia.com/develope...ptimization.pdf
shows how basically there is no difference at all if you send 10 or 200 triangles. And this "threshold" where it doesnt matter anymore, is constantly rising (those numbers are from 2004/2005). todays engines are all about crunching as much in as little drawcalls as possible, the triangle count of single objects becomes less an issue. there is certain types of objects that are rendered using "instancing" setups or "skinning", which are supposed to have less vertices, then other stuff. but even there technical variety is just too huge.

in short, make it look good, dont go insane about taking away tris, unless you do PSP/NDS or whatever mobile, or RTS instanced units. Any modern card can crunch so many triangles that this is not the real limiting factor.

if you run the crysis sandbox editor, you will see there is some more than million tris per frame, and it runs rather smooth still (editor is quite faster than game). In EnemyTerritory the whole 3d terrain is rendered fully... what matters is the total drawcall per frame, and not just meshes but postfx and so on are part of that, yes even gui elements. A lot of limiting comes from shading the pixels, and no matter how you lowres your box will be, if texture + size on screen remains the same, the costs thru shading will be identical. The few extra vertices that need to be calculated are negligable.

There are reasons it takes crytek, epic and other guys with tons of experience to get the most out of hardware, they do a lot of performance hunting, profile how cpu/gpu are stressed at a scene. I mean Rorshach wasnt telling you guys this for no reason, after all their studio works on the real "limits", and get more out of the hardware than most. Hence some "veterans" view will surprise you, simply as they are better informed work for the "top studios". I dont want to diss anyone's opinion, its not like people made them up themselves, its just times move on very quickly. Modern cards can crunch some insane workload, but its all about the feeding mechanism...

so yes you might do some mmo which runs on every crap hardware, but even there hardware principles stay the same and looking at steam's hardware survey, there is a lot of capable hardware around...

After all this should not imply that optimization is bad, or not needed, if you can get the work done with less, and not sacrifice quality, then do it. But there is a certain level were the work/effect payoff just isnt there anymore. (this will differ for older platforms (ps2, mobiles)

----

there is another caveat so, that is the problem with "micro / thin" triangles. If you have triangles that hardly ever occupy much pixels on screen, it kills parallism of the GPU. Which should be taken into account, ie do not go overboard with micro-bevelling.

make sure to read Kevin Johnstone's posts
http://boards.polycount.net/showpost.php?p=762412&postcount=5

MoP · Nov 2007

nice writeup and info here crazybutcher, but i notice you're only referring to "runtime" performance of stuff.
what about loading things into memory, or collision calculations etc.
if you have 100 duplicates of an object and it's not instanced, that is going to add up quite quickly on the memory side of things if you have extra polygons.
any more info on stuff like that rather than just brute force rendering of tris would be cool

you also have to bear in mind stencil shadows, more silhouette polys lead to greater stress there. not every engine uses lightmaps.

good thread tho

CrazyButcher · Nov 2007

yeah mop you are right, stencil shadows are definetely ugly case where more tris = more work. But I dont think stencilshadows will have much future, personal opinion so

or if moved to GPU to do silhouette extraction the tris count will not be soooo important anymore.

loading into memory, well vertexcounts and texture memory are the limiting factor. I remember having made as similar thread once, stating that vertex weigh signficantly less,too. A triangle in stripping weighs just 4 or 2 bytes, and in non strip case three times as much.

collision is software stuff and might use dedicated lowest lod, geometry. Its a whole different story I would not want to touch, but yes here also less is better. However often dynamic objects are approximated with primitives such as boxes, spheres... to keep costs down. And static environment stuff is also fairly optimized to cope with a few more tris.

about "instancing". Engines will instance stuff anyway, I mostly meant higher-techniques that do the rendering in less drawcalls. You will always load that box into memory only once, and have a low weight representation for every time you use it (pos,rot,scale in world + handle to actual geometry). Its just that non-instanced rendering means a drwacall every time you render the box at another position.

of course you might do total unique box variatns, like mesh deformed permanently, but that would be exactly like modelling two boxes.

After all we are talking about a very optimized model already, its not like the box is 1000 tris. Its just the question that the tradeoff of quality/speed for < 300 or so, simply isnt worth it, considering the rendering pipeline.

Kevin Johnstone · Nov 2007

I spent a solid few months of optimizing polys, lightmap UV channels, collish meshs for everything in UT and the act of
stripping 2million polys out of a level generally improved the FPS by 2 or 3 frames.

Polycount is not the huge issue people were rightly sure it was previously. The bigger issue now is texture resolution because all assets carry 3 textures as standard, normal map, diffuse and spec and thats before you have additional mask light textures for emmisives and reflection or whatever other stuff you are supporting in the shader.

Shader complexity is also a bigger issue now because it requires longer rendering time.
Section counts are a bigger issue , meshs carrying 2 of each texture and thus requiring a 2nd rendering pass.

I can't explain things technically enough for people but the coders have explained to me a couple times that the problem with just beating everything down with optimization on things like polycount doesn't affect things as much because of different things being CPU or GPU bound.

Mesh count is a big issue now that everything is a static mesh rather than the majority being BSP. BSP is terribly inefficient compared to mesh rendering also.

A mesh is pretty much free for the 1st 600 polys, beyond that its cost can be reduced dramatically by using lightmaps for generating self shadowing and so on rather than vertex lighting.

The reason also I was saying I wouldnt take out the horizontal spans on this piece was also largely because as an environment artist you have to be thinking about
the crimes against scale the level designers will often make
with your work to make a scene work.

Just because I know its a box , doesn't mean it wont get used as something else much larger so I always try to make sure
it can hold up, whatever it is, when 4 times the scale!

Butcher mentioned instancing, this is another feature that we relied upon much more heavily to gain performance.
Due to textures / BSP being more expensive now and polycounts cheaper we made things modular, very very modular.

For instance, I made a square 384/384 straight wall using a BSP tiling texture and generated about 60 modular lego pieces that use the same texture and all fit with themselves and each other to replace BSP shelling out of levels.

This led to lots of optimizations in general and quick easy shelling of levels and it gave our levels a baseline for
addition of new forms to the base geometry level.

I doubt I'm changing anyone's opinion here, maybe making a normal map driven next gen game will convince you though.
And by next gen, i just mean the current new technology like ID's new engine, crysis engine, UE3 etc because they
are normal map driven and the press likes fancy names
for simple progression of technology.
r.

MoP · Nov 2007

Heh, Rorshach, in our engine you can't scale static models so everything stays the same size unless someone explicitly exports a different version of the model file

And I would expect a good level designer would know not to madly scale up an object obviously designed to be used as a small prop

But yes good info on all fronts, cheers guys.

CrazyButcher · Nov 2007

I should add that triangles are stored as "indexed" lists. that means you get a huge array of all vertices, and a triangle is made from either 3 or 1 index into that array.

say for a quad your have verts = [A,B,C,D]. And now you store triangle info (index starting at 1):
as list: [1,2,3, 2,3,4]
as strip: [1,2,3, 4]

each index normally weighs as much as a half uncompressed pixel. And thats it, triangles are really just adding some more indices to that index list. And their memory is sooo tiny compared to the rest... (unless for collision stuff, or stencils, there you need per face normals as well)

The mentioned texturememory used mentioned by ror, is the one thing to really optimized for, as you can just crunch x megs of texmemory into your graphics card a frame.

The other thing that may reside in memory are static meshes. Note that static here means we dont want to change vertices individually "by hand". But we change thru shaders (bones), or thru spatial placement in total. Which is nearly all vertices we normally see on a frame. The opposite is data that is generated/manipulated more fundamentally every frame, think particles.
Now a game vertex is like 32 or 64 bytes mostly. Which means a 512x512 compressed texture gives you about 8192 "small" vertices or 4096 "fat" vertices. Their weight depends on how accurate and how much extra data you need per vertex (second UVs, vertex color...), let it be a bit less than 4k if is even fatter vertex format.

Now the third texturememory eating thing are the special effects textures, which are uncompressed and can suck up quite some megs depending on window resolution.

Now the "other" memory costs are on the application side, in regular RAM, like the collision info, and "where object x is", relationships... that memory is mostly never a big problem on PC. Just on console you need to be more clever about not loading everything into RAM. As they are more limited.

So we have X memory of what fits into the graphics card which is "texturememory" "framebuffer memory + effect textures" "static meshes" and "shaders" (which are like ultra tiny compared to rest).

Now of course say we have a giant world and want to move around, it will be impossible to preload every texture/mesh into graphicsmemory. So we must cleverly unload/load some while we move around, so noone notices. The amount we can send over without hickups is not a lot (keep in mind that the rest of your scene is still rendered). So one must be very clever about swapping out textures. Hence the modularity and "reuse", rorschach mentioned, is even more important.
not only per-frame will it allow batching of "same state" objects. but even in the long run means less swapping occurs.

now what happens if I have more active textures in the scene than my graphics card can actually load per frame. then the driver will kickin and send a cached copy over (very ugly stalls). The driver also optimizes moments when to reload stuff, as most loading from RAM to Videoram is done asynchronously (ie you call the function doesnt mean it happens right now, but you let driver do it when he wants to). So now we got the driver in the memory equation as well. Now some clever strategies the driver optimizations gurus at AMD/NVidia do might create hiccups in "non common" situations. But what is "common"? If some new major game comes out and has a very specific new way to tackle a problem, we see new drivers magically appearing making smoother rides, of course they might optimize a lot more in those drivers, but anyway...

you get a brief idea of complexity of all this, and why the most common reply on this boards regarding polycounts and whatever is "it depends"

Kevin Johnstone · Nov 2007

Mop: sure, but theres always cases where they need to fill in gaps and anything will do when you can scale things.

Clearly a box prop isnt the best example, i was trying to detail a general attitude toward environment asset creation
to substitute the 'everything must go' all purpose optimization attitude.

Also another reason for extra polys in UE3 is smoothing groups. We try to use one smoothing group because it renders a better normal map.

The optimization at all costs method led me to finding out that reducing polycount and controling the smoothing with more smoothing groups costs as much as using more polys to create a better single smoothing group because
the engine doesn't actually have smoothing groups, it just doubles the edge count where the smoothing group changes.

Which costs as much or more than adding additional chamfered edges to support a cleaner smoothing group.

I still go back and forth on this issue myself but generally
the consensus at Epic is that its better to use more polys and process out a normal map that will render well with light hitting it from any angle.

If you go the purist optimization route ( as I did at the beginning of the project) and optimize with smoothing groups to control things and have half the polycount, you
end up with normals that look good only when hit by light from certain angles and its still just as expensive.

Again, I doubt anyone who has a different view is going to be changed by this information. I didn't change my
opinions until I had it reproved to me dozens of times.

r.

MoP · Nov 2007

Yep, I totally agree that for normalmapped stuff it's way better to have some extra polys and keep a single smoothing group, than to use separate smoothing angles or groups, since as you say they pretty much amount to the same vertex memory anyway, and the former gives a better normalmap.

One of our senior programmers told me that some graphics cards can have a harder time rendering long thin tris (such as those you'd get by having thin bevels instead of smoothing groups) at long range, but I don't know to what extent or how much this would impact performance, not much it seems.

Rick Stirling · Nov 2007

Great to see this all written down with some proper technical backing to it - while there is no excuse whatsoever to piss away polygons for the sake of using them, and those verts do add up, its ALWAYS textures, textures, textures that are the main bottleneck.

As for collision information, in most cases there is little issue with optimising that mesh down. I know we don't use the full resolution mesh for collision. In many/most cases, the LOD is used for collision. For characters its pretty much a series of boxes, because let's be honest - we might have modelled fingers and the inside of the mouth, but when it comes to collision all you care about is the hand and the head.

This bit interested me:

[ QUOTE ]
A mesh is pretty much free for the 1st 600 polys, beyond that its cost can be reduced dramatically by using lightmaps for generating self shadowing and so on rather than vertex lighting.

[/ QUOTE ]

Is that because you don't have to store that data on a per vert basis?

Joao Sapiro · Nov 2007

amzing info here guys, keep them coming

i have a question :

since smoothing groups are basically a detaching of faces ( hence the increase of vert count since the verts were duplicated ) if you have one continuous mesh and one with smoothing groups wich one would be faster to render ? my assumption is the continuos since there isnt any overlaping vertex, but i would like to know more about implementation of smoothing groups on assets , when are they a must do and when its better to make god smoothing via polygon.

i dont make sense.

Kevin Johnstone · Nov 2007

You only learn when and where to break the rules once you've
spent months doing it. I am sorry that this sounds like a cheap answer but its the truth.

I've seen my processing , with the cage, with multiple smoothing groups, then switching to 1 smoothing group on a basic non chamfered wall shape so the smoothing is REALLY
stretched and showing lots of horrible black to white gradients in max.

When I take that ingame, the smoothing forces the engine to bend the normals so when the level is rebuilt you get a
LOT more normals popping out.

Rick: that might be it, i don't remember all the technical reasons for each thing working as it does, i remember
more what works simply from habit now as there are so many
more rules and what not to bear in mind.

k.

EarthQuake · Nov 2007

More split edges(smoothing groups) are going to give you a higher # of verts, this will always be slower. How much slower in reality it actually is i have no idea, probablly not much.

CrazyButcher · Nov 2007

[ QUOTE ]
A mesh is pretty much free for the 1st 600 polys, beyond that its cost can be reduced dramatically by using lightmaps for generating self shadowing and so on rather than vertex lighting.

[/ QUOTE ]

do you mean a unique prebaked AO map?. In theory if you use a second UV for unique map, thats 2 shorts = 4 bytes, which is as much as a vertexcolor. And therefore more expensive (you send same pervertex but still need to sample AO map). it has to do with internal ut3 specific setups.

looking at the ut3 demo shaders, I actually found that you guys probably have some very complex baked lighting stuff, which is better to store into textures. It seems to be more than just a color value. In fact if the effect is done per vertex 3 x float4 are sent, compared to the single float2 for texture coordinate. Which is a lot more, that is "not normal" for oldschool lightmapping

but probably some fancy quality thing you do. havent really reverse engineered the effect, but as per-vertex effect its indeed very fat. But maybe you mean realtime shadows and not baked stuff at all. ... edit: after some more diving into it, its directional lightmapping like in hl2.

anyway this example shows that "it depends", and a magic value like the 600, has to do with vertex formats effects, ie very engine specific.

what the batch article by nvidia showed however, is that there are engine independent "limits", ie say less than 300 today makes no difference if its 1 tris or 300. (the numbers back then were around 200 I simply guessed the 300 for todays cards)

EarthQuake · Nov 2007

I think thats referring to using lightmaps as opposed to stencil shadows? Not actually ambocc type lightmaps.

hobodactyl · Nov 2007

Really cool thread! Per I had a question:

[ QUOTE ]
I never use cages for rendering normal maps anymore. It turns out to be horrible workflow-wise. They most often break when you start editing the original mesh and in the case of 3ds max restrict the processing rays to straight paths instead of bending them nicely creating a smoother result. Instead of ray cages, you're mostly better off simply adding more geo to the object. You can always make that geo count visually as well as cleaning up your normal map results.

[/ QUOTE ]

I was confused by you saying you never use cages for rendering normal maps; I thought that was the only way to render normal maps? Sorry if this is a retarded question; do you just mean you don't use Max's cages?

EarthQuake · Nov 2007

You can use a cage, or you can simply use the "offset" function in max.

oXYnary · Nov 2007

Can we sticky this or add it toa PC wiki or something?

One question:
[ QUOTE ]

edit: lesson learned is: Keep your UVmaps as continuous as possible. If you're mapping a box, all six sides should be connected. Also keep in mind that if you bevel instead of using smoothing groups, it will still increase your vert count if there's a UV seam on the beveled area.

[/ QUOTE ]

So anytime you have a texture seam it will detach the vertices in the engine?

Xenobond · Nov 2007

[ QUOTE ]
Can we sticky this or add it toa PC wiki or something?

One question:
[ QUOTE ]

edit: lesson learned is: Keep your UVmaps as continuous as possible. If you're mapping a box, all six sides should be connected. Also keep in mind that if you bevel instead of using smoothing groups, it will still increase your vert count if there's a UV seam on the beveled area.

[/ QUOTE ]

So anytime you have a texture seam it will detach the vertices in the engine?

[/ QUOTE ]

Yes. UV splits & smoothing group edges will split the vertices. I remember reading a pretty good article about this in a gd mag some years ago. I'll try and dig up that article on gamasutra.

Rick Stirling · Nov 2007

A C&P from a half written tech doc I was working on about uvs (and smothing groups) breaking the tri-strips

[ QUOTE ]

Many artists take the number of polygons in the model as the basis for model performance, but this is only a guideline. The real factor is the number of vertices in the model. As an artist your 3d software will count the number of verts in the model, however this is rarely the same number of verts that a game engine thinks there are.

Put simply, certain modeling techniques break the triangle stripping routine, making the vert count in the game engine be higher than the one reported in your 3d software. These attributes physically break the mesh into separate parts, and thus break triangle stripping algorithms.

The most common of these are:
Smoothing groups
Material IDs
UV seams

[/ QUOTE ]

Xenobond · Nov 2007

Haha. Why am I not surprised.

http://www.ericchadwick.com/examples/provost/byf1.html
http://www.ericchadwick.com/examples/provost/byf2.html

Part 2 talks more on the whole uv/smoothing/mat splits issue.

CrazyButcher · Nov 2007

eric hosts that gamedesign paper, too. I am sure we are just minutes or hours away he posts the links again

you get a fixed set of vertex attributes. Think positon,color,normal + some extras like UV channels, tangent stuff. Simply due to pipelining each vertex has no knowledge about the triangle he is part of nor other stuff (okay untrue for latest geometry shaders). So the vertex cannot have 2 normals, or 2 uvs for the same UV channel, hence the split. There might be more splits that are not visible to you (like mirrored UVs might be connected in max, but broken for tangentspace stuff). Whenever such split occurs all other attributes are copied over, so the normal will stay the same, color... but costs are raised by a full new vertex.
A good deal of "viewport" performance depends on taking the internal 3d data (which is organized different) to those graphics hardware vertices. Hence pure modelling apps or "less complex" on vertex/triangle level, can shortcut more, and benefit from speed.

monkeyscience · Nov 2007

Before you go optimizing smoothing groups, consult your friendly neighborhood engine programmer, shared vertices may or may not be used at all by the engine and your model exporter may ignore your hard work. The engineering term is Vertex Indexing and there are some reasons not to always use it. ALL vertex data has to be the same for a vertex to be shared. Position, normal, uvs, any shader parameters all have to match. If this doesn't happen often enough, indexing is wasteful.

Also, graphics hardware still renders triangles and with no regard to shared data. Pur's example meshes would get rendered as 4, 4, and 6 triangles, or 12, 12, and 18 vertices. Indexing is only a way to compress data in memory and help transfer rates of meshes to the GPU. If transfer rate isn't the limiting factor but the computed vertex count is, smoothing groups won't help. Neither will converting to quads or triangle strips. This usually happens with expensive vertex shaders like skeletal animation skinning or stencil shadow edge finding.

For everything else though, vertex count optimization just won't get you as far as it used to. Most normal mapped games with fancy-pants shaders are fill rate or texture lookup limited. The three little computations to figure out where a triangle ends up on the screen is just prep work for the potentially thousands of pixels that need to be computed.

"Fill rate limited" btw means fill rate is way slower than other work the graphics card is doing so its best to start optimizing there. It does NOT mean all other optimization work should be neglected. That's common n00b programmer talk.

If you do optimize polycount, do it only on shit that matters. Optimize either your half million poly models or models that will be visible in large counts all at once. Spending time on a 10 poly reduction to a tool chest is only justified if somewhere in your game there's a big stack of tool chests visible all at once and you actually shave thousands of polys from that scene.

JKMakowka · Nov 2007

Awesome info, thanks guys!

One thing is still confusing me a bit... how does an engine differentiate between "regular" triangles and quadstrips?
CB already explained that they are stored much more efficiently, but how can I influence that?
Sorry if that is a stupid question

Kevin Johnstone · Nov 2007

Per: 'Just make good art' lol

In the end this stuff gets so damn anus bleedingly technical
that 'Just make good art' 'and leave me the hell alone!' is
really what this thread will boil down to for anyone attempting to see it through

Bottom line for me at this point is that UT3 is out and
you can see exactly what I did there to work around things.
Though obviously there's a lot of things I messed up as some
of that stuff is 3 years old to me now and pretty embarassing.

One key thing I feel I will have to point out about editing
UT3 environment assets is that lightmaps are crucial.

Lightmaps are a uniquely unwrapped 2nd set of UV coordinates that you unwrap for the engine to calculate self
shadowing on objects and reduce the cost of anything over
600 tri's

They are required because most meshs having optimized texture UV layouts to reuse mirrored sections and if
the engine used those to calculate the self shadowing it would look like ass because it would try to apply shadows
on both sides of a mirrored section when only 1 was in darkness.

The lightmap UV's need huge amounts of space around each chunk because the resolution of the lightmap will be
32x32 or 64x64 generally instead of the 1024x1024 resolution
that the actual textures are.

You also need to leave a large space around the edge of the unwrap, i texel i am told. This is because when the lighting
is rebuilt all those assets, 32 or 64 lightmap squares are compiled into large 1024 or 2048 sheets of lightmap information so if
you do not leave a space around the perimeter of the lightmap UV's diferent lightmaps when compiled on the big
sheet will bleed subtly into each other and create s sublte
shadow gradient artifact leaving out from edges where the
bleed occurs.

You also need to split the UV's in the lightmap in each location where the normals are mirrored so it doesnt bleed
between each mirrored half.

When mirroring normals on the unwrap you need to have the center point be mirrored over the X axis horizontally,
like a rorschach rather than mirroring vertically like a calender page.

This is because the normals are calculated from the combination of 3 tangents in code.

r.

CrazyButcher · Nov 2007

JK: you dont need to worry about quads, strips and all that, the exporter or engine pipeline tools will take care of that. I just wanted to show the principle of how its sent to graphics card (ie as vertex indexed lists)

MonkeyScience: when would you actually not used indexed lists? I can only think of very chunked meshes, like particle billboards, or classic BSP brush sides, with a very low "sharing" ratio, but other than that, its kinda unlikely to not benefit from reuse I think. Also the performance papers I read suggest to use indexed primitives. (like this one, even a bit aged, I think triangle lists are the most optimized way of rendering, http://ati.amd.com/developer/gdc/PerformanceTuning.pdf ). Of course the lists have to be optimized for "order" to make best use of vertex cache, but drawing non indexed will take out the benefit of vertex cache completely.
so for most "artist" created triangles, I think it will always be indexed lists, no?

hobodactyl · Nov 2007

Per: Thanks for the quick response! I thought that might have been what you were talking about since I'd seen it in Mudbox. I can see how that would be more time-efficient.

MoP · Nov 2007

Stickied because this thread is really good.

eld · Nov 2007

(on the titles: triangle optimization is not, but vertex is always

)

I always work my optimizations with vertices, and I always consider other things, drawcalls, splits, and when you have to do more draw calls due to different materials/textures.

For console especially it's memory versus the vertexdrawing power, then you have to, to a slight degree keep fillrate in mind. (overlapping geometry)

as on baking with a cage, as mentioned above, if you're having a hard time with the cagebake, and your fingers are itching for the bevel buttons, Just do a combination, bake one with a cage, and one without for the straight rays,

It's just a texture, so you can combine the parts of the different renders that you like, so that you get correct edge-normals and then straight renders on surface-details that might've gone perspective-skewed.

and now again, memory, which should be a big part in this thread too, since you usually only have a small part of the 360 memory to work with

not even half of it in some cases.

reuse surfaces, dont mirror, but ROTATE!

eld · Nov 2007

Per, for me optimization means, looking the same, but costing less, as in, I wouldn't sacrifice the looks for a bit of juice, but there's alot that can be done without actually removing any looks but making it cost less, THAT's optimization.

It's about knowing how stuff works aswell, knowing a bit of tech as an artist.

There's always a visible hardware barrier, and we're always hitting it, way more in some games than others, and games on modern consoles are still struggling to maintain framrate.

While you're fully correct per, I can still see the headache that can come from a big team with only a few persons thinking about optimizations

Noren · Nov 2007

[ QUOTE ]
In case of 3ds max (the cage ) restricts the processing rays to straight paths instead of bending them nicely creating a smoother result.

[/ QUOTE ]

Hi Per, can you elaborate on this please? Sounds wrong to me, but I might have misunderstood you.

Ruz · Nov 2007

in the first post the optimised version still looked pretty good, so what the problem.
surely its also about modelling 'just' enough detail to support the extra detail you are trying to bring out with the normal map.
more about efficient modelling really.
If you can make something look good with 1000 polys, why make it with 1200.

JordanW · Nov 2007

ruz I'm not sure that the optimized version is an actual mesh with those changes made i think it's a paintover, so the implications that would be seen from the inaccurate normals are not shown.

Noren · Nov 2007

Per: I'm a 3dsmax user myself. That's why I got curious in the first place because my experience with the cage has been different.
And even now, if I render the testcase you proposed I get two exactly identical normalmaps.Max 8 and Supersampling activated. I used a simple box here (one SG) and almost always use the cage except for occasions like described by eld. So it can very well be that something slipped under my radar here and I would be very interested if someone could provide an example of cage vs. no cage not matching up. (Cage just pushed, of course, not manipulated further ).
A big plus for me with the cage is, that if you happen to work with smoothingroups it will still interpolate the castingrays and you don't wind up with missing parts in the map, while the normals are still correct.

eld · Nov 2007

[ QUOTE ]
...the non-cage output is going to be significantly better in most cases. Someone may have time to provide some screenshot examples...

[/ QUOTE ]

You can use both though, as a combined result ( it's a texture afterall

), as non-cage renders will usually shoot and miss its target on corners and such, but cages will usually do crazy renders on a a big flat surface that has to have details rendered onto.

Ruz · Nov 2007

per ,don't get downhearted.
What I said was that you should be modelling just enough detail to support the model you are making.
My point was that you shouldn't add more detail just for the sake of it. I did n't say anything about rendering speed.
personally I would keep taking out loops until I thought it was degrading too much in quality. Its about commmon sense a lot of the time.

you guys seem to be talking mainly about high end , next gen stuff like unreal engine/ doom engine.

What about MMO's or similar . I am sure that in the grand scale of things, polycount might have more of an impact.

CrazyButcher · Nov 2007

yes paintover, and Ruz the discussion is more about sacrificing quality for a "few tris", which is not worth it. The discussion is about those very last ultra bits of optimizing. It should not imply that optimizing at all isnt needed, its just that there is a grey zone where the amount of lowered quality or time spent with it, isnt worth the benefit in speed. So its not about "adding more", but "removing too much".
And those "few tris", are with time actually getting more and more. The hardware is still similar for MMOs as well, after all the performance pdfs mentioned, are like 3 years old, which should mean, thats the PC low-end of today.

EarthQuake · Nov 2007

[ QUOTE ]
per ,don't get downhearted.
What I said was that you should be modelling just enough detail to support the model you are making.
My point was that you shouldn't add more detail just for the sake of it. I did n't say anything about rendering speed.
personally I would keep taking out loops until I thought it was degrading too much in quality. Its about commmon sense a lot of the time.

you guys seem to be talking mainly about high end , next gen stuff like unreal engine/ doom engine.

What about MMO's or similar . I am sure that in the grand scale of things, polycount might have more of an impact.

[/ QUOTE ]

The example was obviously not for a low-end mmo, it was for a cuurent generation project. It would be too much work to cover every single platform, every single engine, every hardware level in one thread. We're talking about current tech here, mostly how current generation hardware handles rendering. Of course if you're making a model for warcraft3 you're not going to want to follow these guidelines, so take some of your own advice and use *common sense*.

Ruz · Nov 2007

yeah I sometimes kinda forget that you are 'removing' polys rather than adding them.
It just confused me in the example that because the optimised verison of the box still had a decent bevel along the edges and I thought it would still look correct with a normal map on it.
To me that extra row of loops adds nothing to the silhouette, but what do I know I am character artist:)
TBH I would experiment and if it looked ok I would trust my instinct to say yeah that looks right, the silhouettes ok and there are no weird shading artefacts which there should n't be becaseu the box has beveled edges.

Ged · Nov 2007

interesting read. Do you guys think these principles could be true for online 3D? Director with havok, flash cs3, java etc? or are those engines not powerful enough to experience benefits from good hardware? Im just thinking of all the really low poly online games out there and was wondering wether this is due to performance or bandwidth or software or just the developers?

Some of this thread has just gone over my head but Im currently working on a Director 3D game and its my first 3D game, so this is a relevent topic as theres just 2 of us making the assets.

EarthQuake · Nov 2007

I'm going to go out on a limb and say no, definately not. Those sort of projects are designed to run in a webbrowser, on a large range of hardware and would likely not benifit from the optimizations of a current gen rendering engine.

CrazyButcher · Nov 2007

Ged I think its a mix of everything that results in low-end graphics. The last time I looked at Director3D, it was like Directx7 or 8 renderer below, and it didnt use any "modern" features, read "modern" being 4 year old already... Those bigger content apps normally dont go with the modern hardware, and do a lot more cpu stuff, and hit those batching limits earlier. They are mostly meant to run on "anything" that is some integrated chip sets, with age old drivers. Though I am not sure how good flash or the others are. I know certain java libraries that make use of performance enhancing capabilities do exist, but I dont know how widespread the stuff is.
It would be good to just test the engine with dummy assets of differnt resolution, and see how it behaves on target hardware.

Mark Dygert · Nov 2007

This reminds me of the color pallet optimization discussions. Do we use 16 shades of brown and make the most of it or do we pick stock colors and hope people like all rainbow levels. Now no one cares what pallet your textures use. We're breaking down the barriers that tie artists hands.

I think its important to keep the post as it was presented. It's not a license to waste, but approval to stop over working something to the point it hurts the end result. It's also a call to take the game as a whole into account when modeling one tiny aspect of it. I think people in general, (beginners especially) will over estimate the time that will be allowed per asset. Yes you can make a loverly dumpster out of 250tris and 2mo to work on it. Or you could make an entire alley with 25k tris and those same 2mo.

You want to be careful and not run the other way and never optimize. Being neat and tidy can be a boost to production time, especially if that asset is going to be worked on by other people. passing on something that is easy to work on can be pretty critical when the bugs start rolling in. I always hate having to go back into other peoples files, label materials and sleuth around a file for 20min before I can start fixing things. Spend 20min organizing up front to save someone else 20min of headache. Technically its a wash but people won't mind working on your files if they aren't a nightmare. At that point its not an issue of game resources but production time, which for me is king over all.

The market of games I work on is much lower then the low end mentioned in those PDF's and as such we still have to keep to the old idea of optimize until it hurts, but just a little. It will be a few years before I can toss polys to the wind and not care. I thank Microsoft for pushing quality video cards and making it a center piece of a good vista PC. It will only quicken the death of this timely tradition that keeps me from creating more.

rooster · Nov 2007

i think you made a great point in pnp vig, that polygons and draw calls aren't the only resource, time is THE resource

JKMakowka · Nov 2007

Ok maybe a bit OT, but what about animation costs (e.g. transformation costs). More vertixes would certainly mean higher transformation costs or is that mostly limited by the number of bones (and level of weights) anyways?
And what about vertex animations (.md3) and those new Dx10 geometry shaders?

Of course given the fact that the object isn't fillrate limited anyways.

Edit: hmm to clarify: I think I read some where that DX9 and below hardware only does vertex animations and all the bones and vertex weighting is done on the CPU (and then transferred as vertex animations to the GPU), while on DX10 hardware with geometry shaders the GPU can do that. Is that right?

eld · Nov 2007

Vig, while it is true that time is the most expensive thing, optimizations and such knowledge is a skill just like the art itself is,

a great artist with technical knowlege can do those optimizations quick, and if those are done for each single prop then there's something to gain from it.

The optimizations I do for work doesn't take much extra time, it's nearly always just a quick plan on how to make the object, and a thoughtprocess when in the creating.

It even helps quite alot with the artistical side too!

Bruno Afonseca · Nov 2007

interesting material. but i think you guys are just getting each other wrong.

timing is part of a game artist's skill too, along with optimizing. you just gotta find the balance between it.

CrazyButcher · Nov 2007

JKM: since the first shadermodel 1 cards (geforce3...) it was possible to do skinning on the GPU. Basically with higher shader models, it became more efficient to do (can do more bones and more weights).
This in fact is done, so what you heard is wrong

geometry shaders are mostly good for "generating vertices", which was not possible before. That can also be used to generate 6 copies of a mesh and render in all 6 cubemaps at once, for example. Geometry shaders can also be used to generate shadow silhouettes for stencil shadows. For those shadows it was indeed necessary to do only CPU before, simply as on the CPU was able to detect silhouette edges. hence doom3 is very stressful for the CPU as well. Most games however dont do stencil shadows, and benefit from GPU vertex processing for it. There are some workarounds that can do silhouette detection on older GPU hardware, too, but not so common I think.

for GPU skinning on sm1/2 and even most sm3 hardware, the bones are stored in "uniform/constant" memory, of a vertex shader. Typically sm1 had limits like 25 bones, and sm2 is like up to 75 bones. Then you must feed per vertex the bone index and a weight. typically that is like 2 shorts per assignment. Vertex shaders will then be written for a certain number of maximum weigts per vertex (say 2 or 3), and all vertices (regardless of their actual weights used) will be transformed the same way. Hence if you know the weights per vertex the engine allows, there is no reason at all to not use them at full extent. Typical would be 2 or 3 max weights.
the bones matrices are computed by CPU, before and sent as those "constants". The less max weights per bone = less instructions in shader + less per-vertex data to be stored. The less bones in total = less "constants" to be sent every time the model is rendered.

vertex animation aka morphing is a bit different story, and requires another per-vertex attribute stream that is either also preloaded and "fixed" (think morph targets), or dynamically changed every frame (aka md3). The latter is particularly ugly as it means sending pervertex data every frame, which is supposed to be avoided.

Skinning basically allows all mesh data to be preloaded and stored in vidmem, and only the bones' matrices must be resend. Hence its the preferred way.
However there is several higher techniques possible for animation, that store matrices in textures (sm3 vertex shaders can access textures, but kinda slow), or use renderto vertexstream, stuff and so on. However not in the common case. ut3 and crysis still use just the constants, as I would say nearly everyone else.
On consoles with dedicated vertex processing hardware (like what SSE was supposed to be for CPU), skinning might be done in software for load balancing. Like PS3's Cell has 7 streaming units, that can work with the GPU directly and "help out". Or when real complex vertex stuff is done (unlikely so), or stencil shadowing...

Wells · Dec 2007

this thread is incredibly informative.

thanks for taking the time and effort.

i'm learning!

adam · Dec 2007

Just so I may recap on a couple of points made early on:

Vert count between 2 polys, in-engine, will increase if:
-The 2 poly's share seperate smoothing group
-The 2 poly's are a part of two different UV islands

Vert count between 2 polys, in-engine, will stay 'the same' to the application's count if:
-The 2 poly's share the same smoothing group
-The 2 poly's are a part of the same UV island

Is this correct?

Rick Stirling · Dec 2007

Adam, I believe that is is correct, and I believe you can also add in shaders/materials. If you apply a different shader to each polygon that will break it into 2 objects.

Rick Stirling · Dec 2007

I *think* that when it comes to the shader, that breaks the polygons into a different Drawcall, instead of a batch, but I'm willing to stand corrected.

As to the smoothing groups adding extra verts, if you get your normal maps nailed you can often forego the smoothing groups and set your entire object to a single SG. Also, in the past we'd use smoothing groups on hard edges to stop the polygon shading leaking round (cuffs, jacket hems, hard edged machinery). Since adding this group adds extra verts and (will usually) break your batching, it's (usually) cheaper just to chuck in those extra polygons that a bevel will give you.

Usually cheaper, but when you are dealing with deformable objects (skin/morph), you've got more transforms to compute, so it's a toss up there.

[Technical Talk] - FAQ: Game art optimisation (do polygon counts really matter?)

Replies