Being wrong, again

So, I was wrong again. You can’t blend normals. This should have been obvious to me, seeing as blending normals might result in two normals nulling eachother out if you blend them. Take a normal, lets say it’s 0.5,0.5,0.5 pre-normalized, and you blend it with a vector -0.5, -0.5, -0.5 with the blend factor 0.5. What do you get? That’s right, zero, you get zero, resulting in complete and utter darkness. This is not good, for tons of reasons, many of which I shan’t explain. Just try constructing a TBN matrix where one vector is 0,0,0. Not so easy! So I removed it, and sure, objects which use alpha (mainly transparent surfaces) won’t get lit correctly seeing as the lighting should take place per alpha-object instead of screen-space. On the other hand, it’s hardly visible, seeing as alpha-objects often consist of glass, where as many layers of glass results in an additive effect because of reflections in the light etc, making the rendering error hard to see.

 

Good news are that I finally got CSM to work! It wasn’t easy, and I’ll explain why. As you may or may not know, Nebula saves the depth of each pixel as the length of the view-space position of that pixel. What this allows us to do, is to take any arbitrary geometry in the scene, and calculate a surface position which lies in the same direction that we are currently rendering. Why is this important then? Well, as you see, saving depth in this manner allows us to calculate the distance between lets say a point light, and the shaded surface positions it should light, determine the distance between these two points, and thus light the surface. Great! We can literally re-construct our world-space positions by simply taking the depth-buffer and multiplying it with a normalized vector pointing from our camera to our pixel. Now comes the hard part. What do you do, when you have no 3D-geometry to render, but instead have a full-screen quad, as is the case with our global light. After finding this guide: http://mynameismjp.wordpress.com/2010/09/05/position-from-depth-3/, I tried it out, and got some good and some not so good results.

 

 

 

 

 

 

 

 

What you see is the same scene, using the above mentioned algorithm to reconstruct the world-space position. If you’re lazy, this is the general concept. In the vertex shader, get a vector pointing from the camera (that’s 0,0,0 for a full-screen quad) to each frustum corner. Then, in the pixel shader, normalize this ray, take the camera position + normalized ray * sampled depth. Sampled depth is the length of the view-space position vector. I thought this could be the result of a precision error or whatnot, but the solution was a proof that it wasn’t. Instead of using a ray and such to recreate this, I simply let my geometry shader (not the GS, but the shader which is used to render deferred geometry) output the world-space position to a buffer which would then hold these values. Using that texture to sample the data, the big black blob disappeared. Somehow, the method used for calculating the world-space position has to lack in accuracy, so I pondered. Normalizing the vector could not be correct. Why? Well, consider the fact that I want a vector going from the frustum begin to each pixel on the far plane. Each vector in that respect is not normalized, seeing as the distance straight forward is shorter than it is to a corner, which can be derived using Pythagoras theorem. Currently, the shadows work, and I’ve baked them into an extended buffer, using both depth as the alpha-component, and the RGB as the world space position components, to save using another render target. I’m not satisfied yet though, seeing as the explanation on mynameisjp.com must have some validity to it, but for now, I’ll keep the data in the render target.

Deferred shading revisited

I haven’t really had much time to write here, been busy with lots of stuff.

When testing the rendering pipeline we’ve hit a couple of annoying glitches with the rendering. The first was that UVs and normals for FBX-meshes was corrupt. The fix for the UVs was pretty straight-forward, just flip them in Y (Why?! I have no idea…), but the fix for the normals wasn’t quite as intuitive. First of all, I would like to start off with saying our target platform is PC for our fork of the engine, just so you know. Many of the rendering methods previously in the engine had lots of focus on compressing and decompressing data, in order to save graphics memory and such, but seeing as problems might occur oh so easy, and debugging compressed data is oh so hard, I’ve decided to remove most of the compression methods in order to be able to get a good visualization of the rendering.

The first thing, which is ingenious, is to compress normals (bumped normals of course) into an A8R8G8B8 texture, compressing the X-value in the first two components (A and R) and the Y-value in the other two (G and B). Z can always be recreated using the algorithm z = 1 – x * x – y * y, seeing as a normal has to be normalized. Anyways, the debug texture for such normals would be a brilliant Red and Blue-Green texture, which is impossible to decode by sight, so what I’ve done is to break the compression and use the normals raw. Well, then another problem rose, raw normals would need a texture with the format R32G32B32, one float per each normal, right? Well yes sir, you are correct, but too bad you can’t render to such a texture! Using a simple A8R8G8B8 and just skipping the A-value would give such poor precision, artifacts would be everywhere. Instead, I had to pick a A32R32G32B32 texture as my render target. Wasteful? Yes! Easy to debug? Yes! Beautiful and precise normals? Hell yes! I’d say with a score of +1, it’s a go!

Right, we have two enormous textures to render normals to (one for opaque, one for alpha), what else can we do?

Well, Nebula was aiming for a very broad variety of consoles, ranging from DS to PC to PS3. I’m just shooting from the hip here, but that might be the reason to why they implemented the light-prepass method of performing deferred shading. The pre-pass method requires two geometry passes, one for rendering normals and depth, and the second for gathering the then lighted buffer, together with the diffuse, specular and emissive colors. That’s nice, but it requires two geometry passes (which can get really heavy with lots of skinned characters). The other, more stream-lined method is to render normals, depth, specular and diffuse/albedo to four textures using MRT (multiple render targets), generate light using the normals and depth, and then simply compose everything using some sort of full screen quad. Yes! That sounds a lot better! The only problem is that we need four render targets, something which can only be done on relatively modern hardware, but not for some consoles.

Anyway, the deferred shading method does not incorporate a method to deal with alpha. That does NOT mean you can’t light alpha deferred!

The solution is to render all alpha objects to their own normal, specular, albedo and depth buffer, use them for lighting separately (requires another light pass using the alpha buffers as input), and then in a post-effect, gather both opaque color and alpha color, then interpolate between them! Easy peasy! The way I do it is:

 

/// retrieve and light alpha buffers
float4 alphaLight = DecodeHDR(AlphaLightTexture.Sample(DefaultSampler, UV));
float4 alphaAlbedoColor = AlphaAlbedoTexture.Sample(DefaultSampler, UV);
float3 alphaSpecularColor = AlphaSpecularTexture.Sample(DefaultSampler, UV);
float4 alphaColor = alphaAlbedoColor;
float3 alphaNormedColor = normalize(alphaLight.xyz);
float alphaMaxColor = max(max(alphaNormedColor.x, alphaNormedColor.y), alphaNormedColor.z);
alphaNormedColor /= alphaMaxColor;
alphaColor.xyz *= alphaLight.xyz;
float alphaSpec = alphaLight.w;
alphaColor.xyz += alphaSpecularColor * alphaSpec * alphaNormedColor;

/// retrieve and light solid buffers
float4 light = DecodeHDR(LightTexture.Sample(DefaultSampler, UV));
float4 albedoColor = AlbedoTexture.Sample(DefaultSampler, UV);
float3 specularColor = SpecularTexture.Sample(DefaultSampler, UV);
float4 color = albedoColor;
float3 normedColor = normalize(light.xyz);
float maxColor = max(max(normedColor.x, normedColor.y), normedColor.z);
normedColor /= maxColor;
color.xyz *= light.xyz;
float spec = light.w;
color.xyz += specularColor * spec * normedColor;

alphaColor = saturate(alphaColor);
color = saturate(color);

float4 mergedColor = lerp(color, alphaColor, alphaColor.a);

 

A simple lerp serves to blend between these two buffers, and the result, mergedColor, is written to the buffer.

Sound good eh? Well, there are some problems with this as well! First of all, what about the background color? Seeing as we light everything deferred, thereby also light the background, wherein the background lighted will serve as our final result in the gather method stated above, we will get an unexpected result. Well, what we will get is some sort of incorrectly lighted background which changes color when the camera moves (because the normals will be static but the angle to the global light will change). So, how do we solve this? Well, by stencil buffering of course! Every piece of geometry draws to the stencil buffer, and thus, we can quite simply just ignore to light and gather any pixels outside our rendered geometry, but without having to render our geometry twice! And so, by simply clearing the buffer which the gather-shader writes to, to our preferred background color, we can have any color we like!

So that’s solved then, alpha and opaque objects with traditional deferred shading with custom background coloring, sweet!

Oh, and I also added bloom, easy enough, render bright spots to a downsized buffer, blur it to an even more downsized buffer, blur it again, and again, and then sample it, et voila, bloom!

So, conclusion, what did we win from this, and what did we lose? We got better normals and lighting to the cost of some graphics memory. We removed half our draw-calls by removing a complete set of geometry passes. We managed to optimize our lighting per-pixel by stencil-buffering, which in turn yielded the ability to use a background color. We managed to incorporate alpha into all of this, without any hustle or expensive rendering. All in all, we won!

Also, here are some pictures to celebrate this victory:

Deferred rendering with alpha. This picture shows a semi-transparent character (scary) with fully transparent regions (see knee-line) and correct specularity

Bloom!

Lighting using the A8R8G8B8 texture format (low quality)

Lighting using the A32R32G32B32 texture format (high quality)