Hi friends! Not sure if i'm correct here, but my issue is heavily technical - but of course about game art. Anyway this is pure self-advertisment. I write an articles about...
"Fakes & Tricks of games which impressed me"
It goes about crazy ideas or awesome tech fakes which made an cool effect. I write about this in my blog and collected everyting i can remember. Feel free to comment about my stuff, explain what i didn't understood or tell me about what stuff impressed you.
Works great in Firefox 28, Chrome 34, broken in IE 10 (Error: Unsupported video type or invalid file path). Scrubbing looks great in Firefox, but I get a mess of block artifacts in Chrome, so much I can't see the characters anymore.
Thank you guys! I added MIME tags and the correct codec to the HTML tag. But IE doesn't work either. Have to check tomorrow. Thanks for checking it out!
On a side note, I host my own videos as well, I use a batch tool called ffmpeg. Do a search, you should find a post I made about it. Could save some time & effort.
I updated the webM test. It should now run under FF/Chrome/Safari and IE. Also the videos adjust to the browser window size if it gets smaller than 500px.
I put testing notes in your blog comment section, still some troubles. You're welcome to use my setup if you like, works in all the browsers I've tested, I could package up the files in a zip for you.
This is so amazing. Those animations showing the pipeline (especially from verts to pixels, but all of them really) are going to be so useful when I explain to people how it all works. What a great set of visuals for reference
i always silently read your articles like a lurker but just wanted to say thank you and keep up the good work! Love the way you explain complex subject matter
Your articles were already really useful and interesting to read, but this new one is the best in my opinion. The animations made my day by the way.
Thank you very much.
I've been quietly reading your articles for awhile now. They are always interesting, but you've really out done yourself with this one. Great job and thanks!
Hey, that was a great article it helped me a lot to clear some obscure point/thought I had about those drawcalls.
But one point still remains, what are cycles (I can guess the basics) and how do they effect FPS
and how everything comes along in the end?
I remember listening to a upupDOwnDown podcast and someone mentionned something like they managed to get down to 8 cycle per FPS which was good !?
Draw calls+ shader + cycle = mindfuck at least for me
Kurt Russell Fan Club
Hehe thank you! Glad you like the anims :,)
alfalfasprossen
Good to hear man!
Eric Chadwick
Thanks Eric!
Ravenslayer
Thanks you for taking the time to read through all this stuff
Computron & throttlekitty u 2
Bek
Thanks! I changed all the stuff you mentioned!
cman2k
Thank you! Time to get epaper as standard so we can have animation in normal books too
Obscura
Thank you for your kind words! I hope you'll not be displeased when the next ones will be simpler again :,)
Scorpyx
:,)
jStins
That's great to hear I hope you'll like the future articles too, even if they aren't that complex.
Mrfred
Good point...i have no clue As far as i know you can do 1 or 2 things per cycle but what this means ... i'm not sure. Before i say wrong stuff, i keep my mouth closed But thanks for mentioning the issue - i'll keep an eye on it and maybe i can add a chapter about this.
Great work indeed, some things I'd rename/add/change a little, hence plenty feedback sent to Simon
@Mrfred, "down to 8 cycles per FPS"
A cycle is normally one "pump" through the processor. For example it can do one arithmetic operation per cycle (sometimes even more) per core. "Down to 8 cycles" may mean that a certain subset of shader code is now taking only 8 cycles on the GPU.
Now if you take modern GPU (for example the GTX 980), it has 2048 cores and is clocked at 1126 MHz, which means it runs 2048 x 1126 millions of cycles per second (at least, given there is also a boost clock that is higher).
A very popular operation is "a * b + c", also called fused multiply add (FMA), which only takes one cycle to do 2 operations. When you do lighting you basically add light contributions based on some weight on top of your color. Or when you move/rotate vertices around you also make use of that operation a lot.
That explains why the card can do ~4.6 Teraflops/second (1.126 GHz clock * 2048 * 2).
Different kind of operations take different amount of cycles, memory fetching (texture access) for example can take a lot of cycles in average (multiple hundred), although the GPU will try to cache those fetches to bring down the numbers as well.
uhh lovely work again
one minor critique, found the two column text a bit confusing, feels unnatural to read.
the firefly lighting I would suspect is simply screenspace volume intersection via stencilbuffer. Sort of the same that doom3 used for shadows (extruded shadow volume intersected with scene), but here just the spheres are intersected with the rest and the intersection (masked pixels via stencil) gets brightened via additive/modulative blending.
The intersection purely happens in imagespace using stencilbuffer and depthtesting. The sphere gets drawn without backface culling, frontfaces "increment" and backfaces "decrement". Whenever a depthbuffer causes backfaces missing "the sphere is sunk into scene" our stencil value is non zero. If both front and backface are behind or in front of depth, then stencil remains 0.
Eye -> F (frontFace) B (backface) | depthbuffer = stencil value
different scenarios and stencil op/result, no op if behind depth
-> F +1 B -1 = 0
-> F +1 | B = 1
-> | F B = 0
Deferred shading needs too much shader capabilities, and given the "flatness" of the intersection, it probably is just a simple color output. Proper deferred shading would use a shader to read additional information stored in rendertargets and do full lighting within those "masked" pixels. However this effect here doesn't need any of that rendertarget processing/access, just good old blend funcs.
CrazyButcher
Thank you! I changed the 2-column-text to 1-column again. I liked it more because it looked more like a brochure/catalog but yes, read-ability is more important.
Regarding the lights: That sounds very plausible! Thanks for the explanation! I'll add it as soon as the server runs again. Currently the server is down
You could say that this is a very early/basic version of deferred shading (flat color), and that this evolved to what is now known as deferred shading, with the advances in shader programmability and render target access.
Would however cut the text that suggests it could be done like... For example the depthbuffer statement and such seem a bit far fetched.
I just checked something which makes me confused. Independent of the used technology, the z-Buffer would be used for it (at least when deph buffer and stencil buffer are combined somehow). Right? What i already noticed is, that the z-Buffer in Zelda has only 1/2 the size of the actual resolution. So, if the z-Buffer is used for checking the "collision" with the wall, shouldn't the edges of the light-spere not be less crisp? I've an example image which i would like to show. Left is the light and the edges look very sharp. The right side shows the z-Buffer which is pretty pixelated in comparison:
p.s. I found the full resolution stencil buffer! Will prepare images
Ok, here it is. Drawcall by drawcall. Interesting: I see in the overdraw view, that 3 sphere get rendered per light. Some "fall out" of the stencil buffer again (because they're totally hidden by some geometry). You already mentioned, that 2 spheres are used (fronfaces and backfaces) but what could the 3rd one do?
Just a guess, but the third draw is to actually draw the yellow glow. The first two draws are just there to make the stencil in which to draw stuff.
Meaning:
1. Draw the inside-facing sphere with an inverted depth-sorting ( greater ), meaning: draw the fragments only if they ARE HIDDEN by other geometry.
The effective pixels marked in the stencil buffers are now pixels the lie within the volume of the sphere IF they are not in front of the front-facing sphere faces.
We need to check that also:
2. Draw the outside-facing sphere with a normal depth-sorting ( less_or_equal), meaning: draw the fragments only if they are not hidden by other geometry.
This will make sure that our stencil will not allow us to draw on top of geometry that is actually blocking our view to the light-sphere.
Both stencils together mark pixels that lie "inside the volume of the sphere"
Note, that we only altered our stencil, so the last draw was to the stencil buffer! Drawing to the stencil buffer does not allow us to actually draw pixels to the frame buffer.
Therefore:
3. using the stencil we created, draw a yellowish-transparent sphere. It will only be visible where the stencil-buffer allows it. A depth-sorting function is technically not required because we already sorted that out in the stencil buffer.
Why use the third step and not draw the outside-sphere to the frame-buffer with adequate depth-sorting and the inside-facing-sphere-stencil buffer in effect? Yeah, well... the problem might be that you are drawing transparent stuff now. When drawing transparent stuff you might want to disable the depth-buffer, because other stuff you draw afterwards might get a wrong depth-buffer to check against then.
But what could that be? I mean, they want everything to have the yellow-glow on top, IF they are inside the sphere radius? if not, they could draw other stuff, that should not get the glow from the lights AFTER they draw the light spheres. Because they never altered the Z-buffer (writing to the stencil buffer doesn't alter the z-buffer) it still has only the raw geometry z-values and was not affected by the transparent light-spheres.
So, if there is a reason to draw stuff not affected by the glow, that would make sense, if not, it might be a waste of a draw call, but something tells me they didn't pay much attention to that anyway
edit: Though it looks more like the glow is coming from one big quad drawn across the screen. Then drawing a third sphere wouldn't really make sense i guess...
edit2: just telling from that the second drawn sphere seems to be drawn in full into the stencil buffer, it looks like it is not using the existing stencil as a mask (maybe that was not possible in their renderer). In that case the third call could be to merge the two masks into one.
Simon, all the screenshots you present come from the emulator right? So unless someone looks up what the original hardware was capable of, we don't know 100%. As you say it makes no sense if the depth-buffer was lower res. Maybe the emulator decided to visualize the depth-buffer in lower-res to make it pop out more? Maybe the transformation of depth-buffer to a texture accessible way indeed lowers resolution on the hardware, could be. But the regular depth-test for sure is full-res.
About the lights: In my original description I made use of -1 stencil ops, and assumed wrapping support, which means 0 - 1 becomes 255. Current hardware can also do different stencil ops depending on face winding, so it can render the sphere at once and do the +1 or -1 accordingly. As your screenshots show, this likely wasn't the case back then. We only see increments and "resets".
Hardware (today as well) often has a different internal representation of the depth-buffer, to accelerate things further. That means some hardware has to go through additional work to allow it to be sampled like a texture.
So what I think happens is that +1 for front, +1 for backface, then a "reset" to 1 for anything that is greater 1. That explains the 3 passes. And the final stencil being 0 or 1, and then comes the fullscreen quad doing the lighting as alfalfasprossen originally suspected.
alfalfasprossen the stencil buffer is normally around like a depth-buffer, so you could write to color, stencil and depth buffer at same time if you want to (often they are stored together, 24 bit depth, 8 bit stencil). The alpha/depth storing imo has nothing to do with it, just a different way to mask the lit pixels.
3. using the stencil we created, draw a yellowish-transparent sphere. It will only be visible where the stencil-buffer allows it. A depth-sorting function is technically not required because we already sorted that out in the stencil buffer.
So what I think happens is that +1 for front, +1 for backface, then a "reset" to 1 for anything that is greater 1. That explains the 3 passes. And the final stencil being 0 or 1, and then comes the fullscreen quad doing the lighting as alfalfasprossen originally suspected.
I'm just wondering about drawcall 1159, 1160 and 1161.
1159: A sphere (hidden by geometry) is painted into the stencil buffer 1160: Another sphere is created but doesn't brighen up the stencil buffer. This would speak against the +1 +1 theory (except the new sphere is sorted out by some deph test) 1161: A 3rd sphere is created and masks out everything which is created until then.
But with a deph test in addition to CrazyButchers theory, it should be at least one way to explain it. Right?
A slower version for only these draw calls:
Btw: Thanks for taking all the time helping me with that stuff!
the second sphere doesn't add to the stencil (brighten up) because it fails the depth-test (outwards-facing sphere hidden by geometry). And then the third sphere will kill the stencil because most likely the operation is something like ( if stencil<2: stencil=0)
alfalfasprossen the stencil buffer is normally around like a depth-buffer, so you could write to color, stencil and depth buffer at same time if you want to (often they are stored together, 24 bit depth, 8 bit stencil). The alpha/depth storing imo has nothing to do with it, just a different way to mask the lit pixels.
Oh right, i haven't done any graphics programming with stencil buffers for a few years. Had to look it up in the OpenGL documentation, and probably should've read it with more care
Replies
Thank you very much and thx for taking the time reading them
Thanks for the great articles, as always.
Thanks guys! Good to hear!
webM
I'm doing a small webM test on my blog. If you want, you can help me by visiting it and telling me, if there are any problems. Thanks!
http://simonschreibt.de/webm/
I updated the webM test. It should now run under FF/Chrome/Safari and IE. Also the videos adjust to the browser window size if it gets smaller than 500px.
http://simonschreibt.de/webm/
#49 Render Hell 1.0
Wow, that's a fantastic explanation!
Thank you
Excellent article.
edit: some grammar stuff / typos I noticed:
at the start of book 3 you have "No it gets interesting!" (Should be "Now")
Also, just before section 2 for batching you have "a technique called Batching seams to be useful." (Should be seems)
Section 5, skinned meshes has "intersting" (interesting)
Book 4: "Maybe a atlas texture could help?" (an atlas)
"Talk to you programmers" (your)
Thank you very much.
But one point still remains, what are cycles (I can guess the basics) and how do they effect FPS
and how everything comes along in the end?
I remember listening to a upupDOwnDown podcast and someone mentionned something like they managed to get down to 8 cycle per FPS which was good !?
Draw calls+ shader + cycle = mindfuck at least for me
Hehe thank you! Glad you like the anims :,)
alfalfasprossen
Good to hear man!
Eric Chadwick
Thanks Eric!
Ravenslayer
Thanks you for taking the time to read through all this stuff
Computron & throttlekitty
u 2
Bek
Thanks! I changed all the stuff you mentioned!
cman2k
Thank you! Time to get epaper as standard so we can have animation in normal books too
Obscura
Thank you for your kind words! I hope you'll not be displeased when the next ones will be simpler again :,)
Scorpyx
:,)
jStins
That's great to hear I hope you'll like the future articles too, even if they aren't that complex.
Mrfred
Good point...i have no clue As far as i know you can do 1 or 2 things per cycle but what this means ... i'm not sure. Before i say wrong stuff, i keep my mouth closed But thanks for mentioning the issue - i'll keep an eye on it and maybe i can add a chapter about this.
@Mrfred, "down to 8 cycles per FPS"
A cycle is normally one "pump" through the processor. For example it can do one arithmetic operation per cycle (sometimes even more) per core. "Down to 8 cycles" may mean that a certain subset of shader code is now taking only 8 cycles on the GPU.
Now if you take modern GPU (for example the GTX 980), it has 2048 cores and is clocked at 1126 MHz, which means it runs 2048 x 1126 millions of cycles per second (at least, given there is also a boost clock that is higher).
A very popular operation is "a * b + c", also called fused multiply add (FMA), which only takes one cycle to do 2 operations. When you do lighting you basically add light contributions based on some weight on top of your color. Or when you move/rotate vertices around you also make use of that operation a lot.
That explains why the card can do ~4.6 Teraflops/second (1.126 GHz clock * 2048 * 2).
Different kind of operations take different amount of cycles, memory fetching (texture access) for example can take a lot of cycles in average (multiple hundred), although the GPU will try to cache those fetches to bring down the numbers as well.
http://www.3dcg-arts.net/art/1311
This trick is awesome! I have to amake it the 51th article! But first:
NEW Game Art Trick
#50 Zelda Wind Waker - Hyrule Travel Guide
one minor critique, found the two column text a bit confusing, feels unnatural to read.
the firefly lighting I would suspect is simply screenspace volume intersection via stencilbuffer. Sort of the same that doom3 used for shadows (extruded shadow volume intersected with scene), but here just the spheres are intersected with the rest and the intersection (masked pixels via stencil) gets brightened via additive/modulative blending.
The intersection purely happens in imagespace using stencilbuffer and depthtesting. The sphere gets drawn without backface culling, frontfaces "increment" and backfaces "decrement". Whenever a depthbuffer causes backfaces missing "the sphere is sunk into scene" our stencil value is non zero. If both front and backface are behind or in front of depth, then stencil remains 0.
Deferred shading needs too much shader capabilities, and given the "flatness" of the intersection, it probably is just a simple color output. Proper deferred shading would use a shader to read additional information stored in rendertargets and do full lighting within those "masked" pixels. However this effect here doesn't need any of that rendertarget processing/access, just good old blend funcs.
Thank you! I changed the 2-column-text to 1-column again. I liked it more because it looked more like a brochure/catalog but yes, read-ability is more important.
Regarding the lights: That sounds very plausible! Thanks for the explanation! I'll add it as soon as the server runs again. Currently the server is down
I did a little update to the text, what do you think? Better now? Is it OK that i let the deferred part in the article?
http://simonschreibt.de/gat/zelda-wind-waker-hyrule-travel-guide#update1
p.s. Alex has written something about this issue in my comment section: http://simonschreibt.de/gat/zelda-wind-waker-hyrule-travel-guide/#comment-584
Would however cut the text that suggests it could be done like... For example the depthbuffer statement and such seem a bit far fetched.
I just checked something which makes me confused. Independent of the used technology, the z-Buffer would be used for it (at least when deph buffer and stencil buffer are combined somehow). Right? What i already noticed is, that the z-Buffer in Zelda has only 1/2 the size of the actual resolution. So, if the z-Buffer is used for checking the "collision" with the wall, shouldn't the edges of the light-spere not be less crisp? I've an example image which i would like to show. Left is the light and the edges look very sharp. The right side shows the z-Buffer which is pretty pixelated in comparison:
p.s. I found the full resolution stencil buffer! Will prepare images
Meaning:
1. Draw the inside-facing sphere with an inverted depth-sorting ( greater ), meaning: draw the fragments only if they ARE HIDDEN by other geometry.
The effective pixels marked in the stencil buffers are now pixels the lie within the volume of the sphere IF they are not in front of the front-facing sphere faces.
We need to check that also:
2. Draw the outside-facing sphere with a normal depth-sorting ( less_or_equal), meaning: draw the fragments only if they are not hidden by other geometry.
This will make sure that our stencil will not allow us to draw on top of geometry that is actually blocking our view to the light-sphere.
Both stencils together mark pixels that lie "inside the volume of the sphere"
Note, that we only altered our stencil, so the last draw was to the stencil buffer! Drawing to the stencil buffer does not allow us to actually draw pixels to the frame buffer.
Therefore:
3. using the stencil we created, draw a yellowish-transparent sphere. It will only be visible where the stencil-buffer allows it. A depth-sorting function is technically not required because we already sorted that out in the stencil buffer.
Why use the third step and not draw the outside-sphere to the frame-buffer with adequate depth-sorting and the inside-facing-sphere-stencil buffer in effect? Yeah, well... the problem might be that you are drawing transparent stuff now. When drawing transparent stuff you might want to disable the depth-buffer, because other stuff you draw afterwards might get a wrong depth-buffer to check against then.
But what could that be? I mean, they want everything to have the yellow-glow on top, IF they are inside the sphere radius? if not, they could draw other stuff, that should not get the glow from the lights AFTER they draw the light spheres. Because they never altered the Z-buffer (writing to the stencil buffer doesn't alter the z-buffer) it still has only the raw geometry z-values and was not affected by the transparent light-spheres.
So, if there is a reason to draw stuff not affected by the glow, that would make sense, if not, it might be a waste of a draw call, but something tells me they didn't pay much attention to that anyway
edit: Though it looks more like the glow is coming from one big quad drawn across the screen. Then drawing a third sphere wouldn't really make sense i guess...
edit2: just telling from that the second drawn sphere seems to be drawn in full into the stencil buffer, it looks like it is not using the existing stencil as a mask (maybe that was not possible in their renderer). In that case the third call could be to merge the two masks into one.
About the lights: In my original description I made use of -1 stencil ops, and assumed wrapping support, which means 0 - 1 becomes 255. Current hardware can also do different stencil ops depending on face winding, so it can render the sphere at once and do the +1 or -1 accordingly. As your screenshots show, this likely wasn't the case back then. We only see increments and "resets".
Hardware (today as well) often has a different internal representation of the depth-buffer, to accelerate things further. That means some hardware has to go through additional work to allow it to be sampled like a texture.
So what I think happens is that +1 for front, +1 for backface, then a "reset" to 1 for anything that is greater 1. That explains the 3 passes. And the final stencil being 0 or 1, and then comes the fullscreen quad doing the lighting as alfalfasprossen originally suspected.
alfalfasprossen the stencil buffer is normally around like a depth-buffer, so you could write to color, stencil and depth buffer at same time if you want to (often they are stored together, 24 bit depth, 8 bit stencil). The alpha/depth storing imo has nothing to do with it, just a different way to mask the lit pixels.
I'm just wondering about drawcall 1159, 1160 and 1161.
1159: A sphere (hidden by geometry) is painted into the stencil buffer
1160: Another sphere is created but doesn't brighen up the stencil buffer. This would speak against the +1 +1 theory (except the new sphere is sorted out by some deph test)
1161: A 3rd sphere is created and masks out everything which is created until then.
But with a deph test in addition to CrazyButchers theory, it should be at least one way to explain it. Right?
A slower version for only these draw calls:
Btw: Thanks for taking all the time helping me with that stuff!
Oh right, i haven't done any graphics programming with stencil buffers for a few years. Had to look it up in the OpenGL documentation, and probably should've read it with more care
I completely reworked the section about the lighting in the Zelda game. Thanks for you help guys! What do you think?
Zelda: Stencil Buffer Lighting
Thank you for the hint, here it is:
NEW Game Art Trick
#51 Rei Ayanami – Inner eyes
I reckon with the right normals, you could get a shaded version too.
Edit: On second thoughts, I don't think you could.
[ame="https://www.youtube.com/watch?v=A4QcyW-qTUg"].[/ame]
damn clever trick - thank for post SimonT
http://id-r-mcgregor.blogspot.de/2015/03/concave-dragons.html?spref=fb
Your're welcome! Good to hear that you found them useful