I want to learn more about SDF ray marching, so I started experimenting with reflections as a starting point.
The goal is to have a nicely lit cornell box at the end, with mirror-like and glossy reflections, direct lighting with area shadows, and hopefully indirect lighting as well.
Got the whole scene to reflect and to be reflected.
So far I'm getting really good performance, and I'm curios to see how much worse it gets once I start adding much more complexity. Currently I'm having nearly 500 fps on a 2080ti on 1080p resolution, when the scene fills the whole screen.
Next time I'll add reflections on reflected objects (so if I make the ball to be a mirror for example, its reflection on the wall will still shade as mirror), and I'll try adding roughness.
The current code in the custom node looks funny as it only has really short lines. I should also do materials in a better way...Will think about this.
Some functions such as MAP(this is the scene description) and GetNormal are added to a custom ush file so the custom node is less cluttered and there is some automation - I don't need to sample things one by one.
There might be a better way of doing this using SDF, just like you do soft shadows taking advantage of the falloff of the SDF, I'm not sure yet. What I do currently is to send multiple rays in randomized directions outwards from the reflection vector in a cone shape. The cone angle depends on the roughness value.
I noticed some changes in how custom ush files works in the current versions of Unreal. Earlier it used to be like if you use an external ush file so you don't embed your code into an existing one, the engine won't recompile the shaders when you open the editor. But what happens now, is that it will recompile upon opening the editor even if use own ush. BUT... If you edit your ush while the editor is open, and you save it, then you press enter in your custom node to recompile it, it will recongize the changes in your ush live. BUT... If you do this , live ush editing and then close the editor and open it again, it will recompile all shaders on the next open. The live ush update is cool, but the editor close-open recompilation is soooo stupid. What the fuck.
I also mocked up the same test with Unreal's ray tracing enabled, for reference and performance measurement, and man... The performance of this is far worse than what my ray marcher was producing. When I set max roughness to 1, and go fullscreen, the framerate is around 40 even when I use 1 sample per pixel. I'll post some comparisons later. I need to add a virtual point light to the ray marcher first, so the 2 scenes are closer to each other visually.
I also found that using higher mip level on the cubemap reduces noise but it also reduces shadow detail and makes it more soft. Which makes sense because there is less detail in the cubemap when its lower resolution. So ideally, for high quality sky lighting with rich shadow detail, you would need a certain cubemap resolution, but unfortunately you will also need many more samples to get it less noisy.
Here is a little breakdown of what happens: - We do the usual visibility and normal calculation routine. - We make a very little offset on the ray along the normal direction, so we won't immediately fall into our hit distance threshold. Just a very little step. - We make a for loop again for our occlusion , for each sample. I generate a normalized per pixel vector noise with increasing z coordinate on each sample, so I get per pixel variance in each iteration. - We make a nested loop for our occlusion marching. //So for each sample, we march along the normal direction minus our random direction vector. So we are marching in random directions inside a 180 degrees cone outwards from the previously hit surface. - Sphere tracing is used again, to speed up the marching somewhat. If we hit our geometry, we dont't do anything other than quiting the loop of this sample. If we don't hit anything after a certain distance, we sample the skylight. - When we finished all the loops, we divide the result by the amount of samples taken so we take the average.
If we don't use per pixel ray direction variance, we get an image that has zero noise, but all the samples will be fully visible. We get sharp shadows with some direction offsets. Kinda looks like having some directional lights with sharp shadows and different rotations. It looks bad when you use low sample count. So bad, that the noise is better! The noisy one needs less samples to look decent. At least in the case of skylight. I'll try some similar thing with localized lights sometimes. Like a stochastic sphere light. Probably the stochastic directional light is the least noisy, because unless the sun is so huge, this would have the smallest penumbra radius. Also, if we would want to be fully realistic, we should use a gigantic sphere light for this, that is very far. But I don't think that would be any efficient, and I'm also not sure if we would get considerably better result than using many samples of random directions inside a given radius. One more thought about this. Ideally, in a more complex and realistic system, if we use a cubemap as stochastically sampled ray traced light source, and we implement bounces (global illumination), a directional light would not be needed because the cubemap would have directional lighting too, as a side effect of the hdr lighting over many samples. Even my skylight in the previous post already has. The ambient occlusion is a simplification. Thats where the bounce should happen. But ao as it is, is the lack of bounce. The darker the ao is, the more bounces you would need to do to calculate the correct lighting value. So you could even use an ao map to pre compute the amount of bounces needed with a given maximum.
I used the classic approach where the menger sponge code lives in a custom ush file included into common dot ush for ease of use in the custom node. Then its ray marched in the custom node, and normals are calculated using 3 extra samples in the end, after the marching loop has ended either because the intersection was found, or max t has been reached.
I think this is a really good video for new comers with ray marching and sphere tracing: https://www.youtube.com/watch?v=Cp5WWtMoeKg A lot of my examples uses sphere tracing. This only works if you work with exact distance functions. Hacks also works but you will need to reduce the step size by a lot if you use a lot of distortion and stuff. The ao and skylight example was purely sphere traced even in the occlusion loop. On this scope, its definitely more expensive than my earlier reflections tests.
Only simple shading for now, and precision is set to low so you can see far without loosing a lot of frames. This is scale of 2.5 (Mandelbox shows different patterns on different scales). Runs totally fine on 1080p fullscreen on a 2080ti, but I would probably not expect good framerates on anything lower than a gtx 10th series. You can download the executable from here: https://drive.google.com/open?id=1xBsO264p6G8oNs5QKok1fXtHH27vn1EW
Controls: wasd - movement escape - quit
I would suggest not to go inside geometry because its a little bit epileptic lol. If you would still go inside, you can easily get out by moving back or forwards until you are out.
Here is a build of this state: https://drive.google.com/file/d/1Dxa9h8zbTj3i4g8rGu3VZ1ZMFr6I3jW1/view I added some UI so you can tweak the settings to get better image or more fps. Controls are the same except one thing. Now you need to click and drag in order to rotate the camera. This build needs a much stronger videocard than the previous one, to get smooth image with interactive performance. The default setting, which is 16 ao rays per pixel gives me like 10 fps on the 2080ti on 1080p. Preview:
First I thought this is wrong. The thin long box is the light source and this is a ray marcher I made, and uses a stochastic (or monte carlo) light sampling method, similarly to my recent examples:
Then I figured that it actually makes sense because the light size is small in that direction. To confirm this, I've set up the same scene with rtx, and a rect light which gave me this:
This rtx rect light thing is not really good in this it seems... It doesn't work well when one of the axes of the light is short. Enabling "stochastic rect light" in the console makes this much worse. The result is totally wrong then.
Yeah. I should add a bounce. If it wasn't such a complex shape it would even run real time. The earlier test in the simple scene gave me much better performance. I could probably do so simplifications too, by sacrificing accuracy.
Reflections in reflections. This was very easy. We just need to take the reflection vector again when we hit something inside the reflections. We can make an int for counting bounces, and set a hard limit if we want, or we can use arbitrary max reflection ray march steps, and just stop when we reached that, or hit the sky after a certain distance from the center of the scene. I used high max steps to make it seem infinite.
Trying to make things organized and clean for this one. Using the float2 "Map" approach again, similarly to the earlier cornell box example, where map is the scene description with a position input, and every object lives inside map. X component contains the distance, and Y contains the object IDs which I use to select materials. There is also "CalcNormal", and map is called 3 times in it with offsets. Now I'm wondering if the RayMarch function should auto calculate normals and return it in a field of a struct. Because my area lighting tracing approach doesn't require normals at all. Because what happens, is I trace multiple rays towards random points inside the area light, and the values from the ones that hits the lights are averaged, and the ones hitting objects are ignored. So its basically the average visibility of the light shape. And then it would be an unnecessary call. However, the visibility and reflections traces requires returning normals. Any suggestions? Should I make a tracing function without returning normals, and I manually call it when its needed? Or should I make 2 tracing fuctions, one returning normals, and the another doesnt?
Some planning of optimizations. I also figured that the diffuse lighting should be applied only after we finished each reflection bounce, and we accumulated positions for ray origins inside the reflections for tracing the gound lighting in them. I could also use a bigger struct instead of float2 for Map. This way I could apply material properties more efficiently.
I'm having some issues getting unreal to compile my code when I use a struct as output type of a function. For example, I made this struct:
struct SDFHitResult
{
float hit;
float3 hitpos;
float objectID;
float t;
};
This compiles fine, but if I use this like here:
SDFHitResult SphereTrace(float3 ro, float3 rd, int maxsteps, float prec)
{
float hitmask = 0.0;
float3 hitpos = 0.0;
float ID = 0.0;
float t = 0.0;
for(int rm= 0; rm<maxsteps;rm++)
{
float2 cursample = MAP(ro);
if(cursample.x<=prec)
{
hitmask = 1.0;
ID = cursample.y;
hitpos = ro;
break;
}
t+=cursample.x;
ro-=rd*cursample.x;
}
return SDFHitResult(hitmask,hitpos,ID,t);
}
It doesn't compile. If I change the output type of the function to be float instead, it compiles so its something with the struct being used as output type. Any ideas?
Welp... I guess I would need to use native types, and maybe go with 2 vertor3 inouts. That is disappointing. The funny thing is that I tried the struct with the function in shadertoy, and it worked there...
The reflections and shadows look nice. I was also experimenting with area lights and shadows, but this is some next level stuff. Fun stuff. I think you need to wrap your functions inside a struct to be able to reference other structs.
Hi @stororokw thanks for taking a look. Can you explain what is the reason for this? It doesn't seem to make much sense. Espacially since the code works in shadertoy without wrapping it into a struct. Is this something hlsl specific, or maybe, Unreal specific?
Thanks!
For lights and reflections, I go beyond simple df shadows, because I'm going for a more accurate and realistic look, no matter if its more expensive. What I do is to fire multiple rays towards the light source (towards random points inside the light), and average the result. This gives me realistic soft shadows, whos penumbra actually depends on the light size. For glossy reflections, same. Multi bounce reflections is just an extension of simple reflections, and it makes the reflection ray not stopping, but bouncing and keep going when it hits a reflective surface, until it hits something non reflective, or reaching user set limits.
Also. I checked out your sketchbook. It has some cool stuff in it. I would like to ask about the bvh if you don't mind. How much is the speed-up after using adding the bvh using the dragon mesh? I would assume its multiple 100s %. Since only a few tris would be intersected per pixel. How much depth do you need to add to the bvh to gain the needed extra performance? and where would the bvh become too heavy (because of too many depths)? Also, do you know if other acceleration structures are worse than bvh, or why is this one preferred against the other ones? I'm assuming an octree would work just as well. Or a simple axis aligned uniform grid perhaps?
It is specific to unreal engine's material functions and how it generates the hlsl code, if you use UE4's global shader it works fine.
From what I remember it just copies the code into the material expression and based on the error message I am guessing that when it translates the expression it cant compile function definitions constructs without some scope, but I am unsure. I think its best to go through with the debugger and step through FHLSLMaterialTranslator and UMaterialExpressions to be absolutely sure, but who has time to compile a debug build from source.
How many samples are you taking for the area light shadows? When I tried even 8 samples was noisy.
Without the BVH it takes 1000+ms and with the bvh 4-8ms. My implementation just splits the triangles into 2 halves, so the depth is log_2(n) where n is the number of triangles. The dragon has 800k triangles so it has a depth of 16. Anything above a depth of 4 (16 primitives) and you see performance increases.
For performance it might be good to have a 4-ary bvh since you can intersect 4-aabb at once, and the tree has less depth, but I havent tried yet. So, I dont know if having too many nodes is slowing it down.
I tried using a KD-tree, which is similar to octree, but for me it was slower in my test scenes. Maybe my implementation sucked, I didnt implement the mailboxing technique. I dont think the axis aligned uniform grid would be faster in the general case because of the 'teapot in a stadium problem'. You would need a very dense grid for that to work.
I prefer the BVH because its conceptually simple and easier to implement.
@stororokw - Thanks for the detailed explanation. Do you know of any downside of using structs like this in Unreal? Should I still approach the problem using a struct, or you have some another suggestion?
Yes, 8 samples can looks grainy depending on the light size. These ray tracers are supposed to be denoised afterwards. However, I'm noticing some very destructive side effects of the denoising, when I directly compare the non denoiser high sample count ray marched to the low sample count denoised rtx ones (comparison images above). They might not be as visible in an actual, textured scene though.
Some of these could accelerate complex sphere traced scenes too, because you could reduce the amount of steps needed near objects, or when the ray direction is near parallel to the closest surface. Or simply to increase the amount of objects that you can use. Or sphere tracing a triangle mesh would become possible
There shouldnt be any downsides. Another approach you can take that is to basically inject hlsl code into the MaterialTemplate. It is described here http://blog.kiteand lightning.la/ue4-hlsl-shader-development-guide-notes-tips/. You have two material functions one to define your structs and one to use them. The upside is your code can mostly be the same as with shader toy. Material Function 1
return 1;
}
struct A
{
float3 a;
};
A Func()
{
A a = (A)0;
a.a = float3(0.0f, 1.0f, 0.0f);
return a;
Material Function 2
A a = Func();
return a.a;
It produces the following unreal hlsl code.
/* Uniform material expressions.*/
MaterialFloat3 CustomExpression0(FMaterialPixelParameters Parameters)
{
return 1;
}
struct A
{
float3 a;
};
A Func()
{
A a = (A)0;
a.a = float3(0.0f, 1.0f, 0.0f);
return a;
}
MaterialFloat3 CustomExpression1(FMaterialPixelParameters Parameters)
{
A a = Func();
return a.a;
}
Did you use UE4's denoiser, or your own implementation?
Thank you very much, this looks somewhat more convinient. When I use the vanilla rtx, I use the built in denoiser, but based on my results, I will turn it off for shadows in the future. It also only works until 4 ssp, based on the documentation, but when I use multiple spp in reflections, it seems like it really only works with 1 spp... In my ray marching examples, I don't use any denoiser. I just use high enough sample count that gives me a mostly smooth image. The brute force skylight requires a lot of samples, but you can lower the cubemap resolution, which will reduce the amount of detail in it, so less samples can yield a more noise free result. In some cases, this can lead to loss of shadow detail.
I tried out this last method in a quick test, and it worked. Thanks again. Maybe I don't know enough, but wrapping a function in a struct sounds weird and hacky to me, so I rather preferred doing it some other way if possible.
This is not useful in simple scenes with a few primitives, but it could help with the mandelbox, because its pretty heavy by default. Here is an idea of having a volume texture copy (like a cache) of the sdf with some offset from the surface, and use it as a sampler lod. So it goes large uniform steps until it hits the proxy, and switches to full quality sphere tracing inside it for tracing the actual sdf. I'm not sure if this order is more optimal, since sphere tracing can make large jumps in sparse spaces. Maybe the reverse of this, so sphere trace rough shape, and switch to uniform small steps then? The basic idea is to reduce the amount of evaluations of the expensive sdf needed. Surfaces near parallel to the ray is still a problem I guess, but I'd need to see how much. Using small safe distance could help this somewhat. It would be also possible to use sphere tracing on both, and it would be still cheaper, because like half of the rays would be in safe zone unless you stare right at a wall from close, so the most of the samples would be taken from the cached version. It should also switch back and forth between the 2 sampler types based on where the ray is currently.
Awesome. Yep, using 1k bg, mip level of 3-4 gave the best balance for me too. How is the performance? I've heard that sampling the global sdf isn't cheap. Wouldn't it be more efficient to sphere trace the samples? Since the medium you are tracing through is a distance field. Or the sdf is too low quality? Or there would be too many close but missed samples?
@RadiusGordello What kind of render targets would you create for this purpose?Would you only store the final result in one channel, or create multiple buffers? I guess it depends on the context. So far, I'm applying my ray march scenes to the screen as a post process effect. But using a render target would allow smaller resolution, thats right.
Is it possible to sample the reflection captures, instead of a global cubemap?
Hmmm... I tried sampling the global distance field to make some ao and some strange thing happens. I try to do it as a post process, and it appears and works in the material editor viewport. but when I apply it to a post process volume, I get black screen. I tried surface material domain too, but I get the same result. Black in the level, but works in the material editor viewport.
That is some weird shit... I added a mesh with a material utilizing distancetonearestsurface, and now the post process works as long as that mesh is on the screen. How did you get around the pixelation of the global sdf when you move the objects around in the scene? Increasing the mesh sdf res doesn't help.
Not sure if this helps at all, but per-asset Mesh Distance Field resolution/scale has no effect on the Global Distance Field. To increase that, you'll want to check & modify r.AOGlobalDFResolution and r.AOInnerGlobalDFClipmapDistance
Okay, thanks, I'll check them out. When I display the sdf or do some traces, and I move a mesh around, the mesh surface crosses the voxel boundaries which obviously leads to unwanted hits. I get this even with some initial position offset along the world normal. I will be interested to see how much can be the global sdf res increased, and how much it will cost.
Replies
The goal is to have a nicely lit cornell box at the end, with mirror-like and glossy reflections, direct lighting with area shadows, and hopefully indirect lighting as well.
So far I'm getting really good performance, and I'm curios to see how much worse it gets once I start adding much more complexity. Currently I'm having nearly 500 fps on a 2080ti on 1080p resolution, when the scene fills the whole screen.
Next time I'll add reflections on reflected objects (so if I make the ball to be a mirror for example, its reflection on the wall will still shade as mirror), and I'll try adding roughness.
The current code in the custom node looks funny as it only has really short lines. I should also do materials in a better way...Will think about this.
Some functions such as MAP(this is the scene description) and GetNormal are added to a custom ush file so the custom node is less cluttered and there is some automation - I don't need to sample things one by one.
///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////
The gif compression kinda killed the thing...
There might be a better way of doing this using SDF, just like you do soft shadows taking advantage of the falloff of the SDF, I'm not sure yet. What I do currently is to send multiple rays in randomized directions outwards from the reflection vector in a cone shape. The cone angle depends on the roughness value.
This is kinda weird to me because there is only one big difference between the 2. While I use mathematical scene description, the RTX one uses poligonal meshes. I know that real time ray tracers uses BVH (bounding volume hierarchy) as acceleration structure, but I use "sphere tracing"(reference below) so both scenes ray tracing is accelerated.
https://www.scratchapixel.com/lessons/advanced-rendering/rendering-distance-fields
https://www.scratchapixel.com/images/upload/distance-fields/sphere-tracing-examples.png?
Here is a ray marched skylight with multiple samples. Runs nicelly. I get around 120 fps with 32 samples per pixel.
- We do the usual visibility and normal calculation routine.
- We make a very little offset on the ray along the normal direction, so we won't immediately fall into our hit distance threshold. Just a very little step.
- We make a for loop again for our occlusion , for each sample. I generate a normalized per pixel vector noise with increasing z coordinate on each sample, so I get per pixel variance in each iteration.
- We make a nested loop for our occlusion marching.
//So for each sample, we march along the normal direction minus our random direction vector. So we are marching in random directions inside a 180 degrees cone outwards from the previously hit surface.
- Sphere tracing is used again, to speed up the marching somewhat. If we hit our geometry, we dont't do anything other than quiting the loop of this sample. If we don't hit anything after a certain distance, we sample the skylight.
- When we finished all the loops, we divide the result by the amount of samples taken so we take the average.
If we don't use per pixel ray direction variance, we get an image that has zero noise, but all the samples will be fully visible. We get sharp shadows with some direction offsets. Kinda looks like having some directional lights with sharp shadows and different rotations. It looks bad when you use low sample count. So bad, that the noise is better! The noisy one needs less samples to look decent. At least in the case of skylight. I'll try some similar thing with localized lights sometimes. Like a stochastic sphere light. Probably the stochastic directional light is the least noisy, because unless the sun is so huge, this would have the smallest penumbra radius. Also, if we would want to be fully realistic, we should use a gigantic sphere light for this, that is very far. But I don't think that would be any efficient, and I'm also not sure if we would get considerably better result than using many samples of random directions inside a given radius. One more thought about this. Ideally, in a more complex and realistic system, if we use a cubemap as stochastically sampled ray traced light source, and we implement bounces (global illumination), a directional light would not be needed because the cubemap would have directional lighting too, as a side effect of the hdr lighting over many samples. Even my skylight in the previous post already has. The ambient occlusion is a simplification. Thats where the bounce should happen. But ao as it is, is the lack of bounce. The darker the ao is, the more bounces you would need to do to calculate the correct lighting value. So you could even use an ao map to pre compute the amount of bounces needed with a given maximum.
I used the classic approach where the menger sponge code lives in a custom ush file included into common dot ush for ease of use in the custom node. Then its ray marched in the custom node, and normals are calculated using 3 extra samples in the end, after the marching loop has ended either because the intersection was found, or max t has been reached.
https://www.youtube.com/watch?v=Cp5WWtMoeKg
A lot of my examples uses sphere tracing. This only works if you work with exact distance functions. Hacks also works but you will need to reduce the step size by a lot if you use a lot of distortion and stuff. The ao and skylight example was purely sphere traced even in the occlusion loop. On this scope, its definitely more expensive than my earlier reflections tests.
https://drive.google.com/open?id=1xBsO264p6G8oNs5QKok1fXtHH27vn1EW
Controls:
wasd - movement
escape - quit
I would suggest not to go inside geometry because its a little bit epileptic lol. If you would still go inside, you can easily get out by moving back or forwards until you are out.
Code is heavily based on various shadertoys.
Wonderful. lol.
https://drive.google.com/file/d/1Dxa9h8zbTj3i4g8rGu3VZ1ZMFr6I3jW1/view
I added some UI so you can tweak the settings to get better image or more fps. Controls are the same except one thing. Now you need to click and drag in order to rotate the camera. This build needs a much stronger videocard than the previous one, to get smooth image with interactive performance. The default setting, which is 16 ao rays per pixel gives me like 10 fps on the 2080ti on 1080p.
Preview:
Leaving this project now, its time to move on.
Then I figured that it actually makes sense because the light size is small in that direction. To confirm this, I've set up the same scene with rtx, and a rect light which gave me this:
But you will need the same high number of samples as with my ray marcher.
Some planning of optimizations. I also figured that the diffuse lighting should be applied only after we finished each reflection bounce, and we accumulated positions for ray origins inside the reflections for tracing the gound lighting in them. I could also use a bigger struct instead of float2 for Map. This way I could apply material properties more efficiently.
This compiles fine, but if I use this like here:
It doesn't compile. If I change the output type of the function to be float instead, it compiles so its something with the struct being used as output type. Any ideas?
I think you need to wrap your functions inside a struct to be able to reference other structs.
Thanks!
For lights and reflections, I go beyond simple df shadows, because I'm going for a more accurate and realistic look, no matter if its more expensive. What I do is to fire multiple rays towards the light source (towards random points inside the light), and average the result. This gives me realistic soft shadows, whos penumbra actually depends on the light size. For glossy reflections, same. Multi bounce reflections is just an extension of simple reflections, and it makes the reflection ray not stopping, but bouncing and keep going when it hits a reflective surface, until it hits something non reflective, or reaching user set limits.
Also. I checked out your sketchbook. It has some cool stuff in it. I would like to ask about the bvh if you don't mind. How much is the speed-up after using adding the bvh using the dragon mesh? I would assume its multiple 100s %. Since only a few tris would be intersected per pixel. How much depth do you need to add to the bvh to gain the needed extra performance? and where would the bvh become too heavy (because of too many depths)? Also, do you know if other acceleration structures are worse than bvh, or why is this one preferred against the other ones? I'm assuming an octree would work just as well. Or a simple axis aligned uniform grid perhaps?
Yes, 8 samples can looks grainy depending on the light size. These ray tracers are supposed to be denoised afterwards. However, I'm noticing some very destructive side effects of the denoising, when I directly compare the non denoiser high sample count ray marched to the low sample count denoised rtx ones (comparison images above). They might not be as visible in an actual, textured scene though.
https://en.wikipedia.org/wiki/Bounding_volume_hierarchy
https://en.wikipedia.org/wiki/Octree
https://en.wikipedia.org/wiki/K-d_tree
Some of these could accelerate complex sphere traced scenes too, because you could reduce the amount of steps needed near objects, or when the ray direction is near parallel to the closest surface. Or simply to increase the amount of objects that you can use. Or sphere tracing a triangle mesh would become possible
Another approach you can take that is to basically inject hlsl code into the MaterialTemplate. It is described here http://blog.kiteand lightning.la/ue4-hlsl-shader-development-guide-notes-tips/.
You have two material functions one to define your structs and one to use them. The upside is your code can mostly be the same as with shader toy.
Material Function 1
Material Function 2
It produces the following unreal hlsl code.
Did you use UE4's denoiser, or your own implementation?
Here is an idea of having a volume texture copy (like a cache) of the sdf with some offset from the surface, and use it as a sampler lod. So it goes large uniform steps until it hits the proxy, and switches to full quality sphere tracing inside it for tracing the actual sdf. I'm not sure if this order is more optimal, since sphere tracing can make large jumps in sparse spaces. Maybe the reverse of this, so sphere trace rough shape, and switch to uniform small steps then? The basic idea is to reduce the amount of evaluations of the expensive sdf needed. Surfaces near parallel to the ray is still a problem I guess, but I'd need to see how much. Using small safe distance could help this somewhat. It would be also possible to use sphere tracing on both, and it would be still cheaper, because like half of the rays would be in safe zone unless you stare right at a wall from close, so the most of the samples would be taken from the cached version. It should also switch back and forth between the 2 sampler types based on where the ray is currently.
Is it possible to sample the reflection captures, instead of a global cubemap?