PSSM and CSM

I’ve been hard at work during the last 4 days with global light shadowing. One might think this should be extremely simple, but there is a hitch with global ligths, they have NO position! So how does one render shadows from a source when there is no source?

The solution is found in two different algorithms, one called PSSM (Parallel-split Shadow Maps) and CSM (Cascading Shadow Maps). Firstly though, I should explain what a shadow map actually is. A shadow map is a light-perspective rendered buffer of the scene, which basically means that every light source which needs to cast shadows has to render every object visible to the light as a depth-based buffer. The shadow map basically just stores depth saved from a lights point of view. This of course means that a non-shadow casting light source is significantly faster than the shadow-casting ones.

Now when we’ve gotten that out of the way, we can go back to global lights. As you may or may not have realized, in order to render a shadow map you need a position of a light source, and a direction in which it emits light (except for the point light, which only has a position). The problem is that a only has the direction, so there is no point of view from where one can render the scene. To resolve this, there is a very non-intuitive way of doing it, and that is to split the camera view-frustum into different sections (see Cascades and Parallel-splits). The light source is then rendered from outside the scene bounding box, just to ensure every shadow-casting object gets into the buffer. In order to handle rendering the entire scene into the shadow buffer without needing an enormous buffer, one renders the different splits into different textures, but using different projection transforms to do so. So the first buffer surround the closest area of the camera, the next split or cascade overlaps a larger area and so forth, until it reaches the maximum limit which is based on a given distance. This way, shadows very very far away will be rendered with a very low resolution, seeing as the viewport for that area is very big, so each item doesn’t get that much space.

Nebula used to use this algorithm in version 2.0, without deferred rendering, so re-implementing this algorithm in Nebula 3 with DirectX 11 is a bit of a challenge. I was so in the dark I actually thought I’d just been lucky with everything I’ve ever done right, because how hard I tried I couldn’t get the shader to switch between the shadow maps based on the depth to the pixel. I used a float array to send split distances from the CPU to the shader, but the comparison ALWAYS failed. After about 3 days of getting that part to work, I realized I just had to try comparing the values without a variable, but instead hard-code them. To my amazement, that worked fine, perfectly fine. By this point, I honestly started to consider if Nebula didn’t properly set float arrays to the shader. I debugged the pixel shader in PIX, and the values where completely fine. The funny thing was that the compiled shader description clearly showed that the float array, consisting of 5 floats, was 80 bytes in size. 80! Last time I looked, a float was 4 bytes, 5 * 4 = 20, not 80. That was when I realized what was wrong, for some reason, the compiler (or driver) seemed to think that an array of floats was equivalent to an array of float4, but could nonetheless fetch the values without using swizzling. When I changed the array into an ordinary float4 (obviously loosing a value), the depth test worked perfectly! I’m going to be a bit careful, because this could be simple ignorance, similar to the hull-domain shader problem, but there might be some sort of compilation problem going on here. Seeing as an array of float apparently had the same buffer size as a float4 array of equal size, and because fetching values always returned the incorrect (I believe I got infinite, but it’s somewhat hard to tell on a shader-level) values.

Right, back to PSSM. It currently looks like utter crap. It looks so bad infact, that I’m not even going to show you, because of shame and all that. There are two problems at the moment. The first one is that all geometry gets rendered with the camera inverted, so everything gets rendered backwards, resulting in very strange shadows, the most funny effect being that the object which casts shadows is shadowed by itself. The other problem (may or may not be related to the first one) is that in the seams between the two different split maps is clearly visible because of the radial pattern when comparing depths in screen space. If you’ve had the patience to read this far with the knowledge that you won’t be seeing any pictures, I salute you! Oh and, we’ve started this years game project, which is being made in Nebula, both of which can be followed here: http://focus.gscept.com/gp2012-1/ and here: http://focus.gscept.com/gp2012-2/.

HBAO and ESM

Lately, I’ve been hard at work with the stuff I like the most, shading! First, I reimplemented the old Nebula exponential shadow maps (which looks great by the way) in DX11 for pointlights and spotlights. The direction light is a little trickier, but I will be all over that shortly. I also took the chance to remove the DSF depth buffer into a more high-precision depth-buffer. The old DSF buffer stored 8 bits normal id, 8 bits object id and 16 bits depth, whilst the new buffer stores 32 bits pure depth, removing all the halo-problems.

Bringing shadows to life sure wasn’t easy, but it was well worth the while. Nebula had a limit to only use 4 shadow casting lights per frame, I thought I could boost that to make it use 16 (newer hardware can handle it). I’ve also begun working on the global light shadow algorithm, and I thought I’d start with making the PSSM method work. The reason why I chose PSSM is because major parts of it have already been implemented, but also because it seems like a very valid concept.

To handle AO, and setting variables for the AO, I decided to make a new server for it, which was fitted right next to the light and shadow servers (seeing as it has to do with lighting). The method for computing screen-space ambient occlusion is called HBAO, which basically samples the depth buffer using an offset texture (random texture). It uses the current sample point along with the sampled point to calculate not only depth difference, but also the total difference in angle, giving off a strong occlusion effect if the angle between two surface tangents differ a lot. More of the algorithm can be found here: http://www.nvidia.com/object/siggraph-2008-HBAO.html.

The introduction of the AOServer also allows for setting the AO variables live (whenever there is an interface available). Here are two pics showing the awesome new graphics.

 

The picture on the right shows a real-time AO pass, and the right shows a scene with 3 shadow-casting lights. Pretty nice right!

 

Also found a couple of things I want to do with Nody. First, I changed so one can create different types of render targets. If one wants to write to a 1, 2, 3 or 4-channel texture, one should have no problem doing so. I’m also considering being able to add and manipulate render targets between frame shaders. For example, the AO-pass uses the DepthBuffer from the main frame shader, and the main frame shader  uses the AO-buffer from the HBAO frame shader. I also want to be able to add an MRT or an RT directly to the output node, and then decide how many channels one wants to use. When this is done, one should not be able to add new render targets or MRTs. This is to avoid silent errors which might occur if the render targets have their positions switched. Also, attaching render targets to the output nodes should also allow you to pick from ANY frame shader, instead of just the current one.

It would also be awesome to have the ability to use compute-shaders in Nody, as well as nodes which lets you code everything freely.