New Domain and Name

As you may have noticed, we have moved the page to a new domain and have started on writing/collecting documentation. We are still working on a release but due to changed priorities it got delayed a bit. I hope to be able to finally wrap up the release in the upcoming month, but this time no promises. One of the main reasons holding everything back is demo content, we don’t have any suitable demo game/project to release along with the SDK, and since current documentation is either a bit outdated or incomplete, it would make it a bit hard to get started if you haven’t worked with any incarnation of Nebula before.

Apart from the new Domain we also decided to changed the name of our fork. The last release of the Nebula3 SDK from Radon Labs and our version have diverged considerably with respect to tools, file formats, and the whole pipeline, that we figured we should use a new name to avoid confusion. Our version will be called Nebula Trifid from now on, keeping the to the space theme. For those that are interested, the trifid nebula is a specific nebula located in the Sagittarius constellation.

//Johannes

 

Physically based lighting

Seeing as we’re aiming for a bleeding edge engine, there is no need to skip out on anything. A little bird whispered in my ear that there are other ways of performing lighting than the standardized simple blinn-phong method commonly used, and since we’re on a pretty flexible budget when it comes to graphics performance, I thought I should give it a good looksie.

Physically based lighting basically takes more into account than regular lighting. It also provides a more ‘real’ representation of the world in terms of reflective light (albedo) and surface roughness/gloss. Couple that with the original cheat called normalmaps and you got yourself some pretty good-looking effects. Basically, all  materials have been added with a new roughness map which allows a graphics artist to author the surface complexity of a model. This allows lighting to properly respond to the surface instead of just applying a uniform specular reflectiveness. The shader code (mostly taken and translated from http://www.altdevblogaday.com/2011/08/23/shader-code-for-physically-based-lighting/) looks like this:

float normalizationTerm = (roughness + 2.0f) / 8.0f;
float blinnPhong = pow(NH, roughness);
float specularTerm = normalizationTerm * blinnPhong;
float cosineTerm = NL;
float base = 1.0f - HL;
float exponent = pow(base, 5.0f);
vec3 fresnelTerm = specColor.rgb + ( 1.0f - specColor.rgb ) * exponent;
float alpha = 1.0f / ( sqrt ( (PI / 4) * roughness + (PI / 2)) );
float visibilityTerm = (NL * (1.0f - alpha) + alpha ) * ( NV * ( 1.0f - alpha ) + alpha );
visibilityTerm = 1.0f / visibilityTerm;
float3 spec = saturate(specularTerm * cosineTerm * fresnelTerm * visibilityTerm) * lightColor.xyz;

As you can see, this code is way more complex than the standard formulae. What you can see here is that instead of using a constant value for specular power, we instead use the roughness. This allows us to have a per-pixel roughness authored by a graphics artist. The only downside to this is that roughness is somewhat unintuitive in terms of encoding/decoding. To decode roughness, which is a value in the range [0..1], I use this formula (taken from  Physically-based Lighting in Call Of Duty: Black Ops):

float specPower = exp2(10 * specColor.a + 1);

This allows our specular power to be in the range [1, 8192]. Since our method uses the Blinn-Phong algorithm for distribution, our specular power is much greater than the range [0..1], however easier to compute than the more advanced yet more intuitive Beckmann algorithm (which actually operates in the range [0..1]). The result can be seen in the picture below:

Screenshot from 2013-12-12 13:25:13

Note the specular light given off by the local lights which was previously non-existant.

A part of performing physically based lighting is to also use reflections and proper ‘roughing’ of the reflections. Reflections affects both specular light (since it’s actually a reflection, go figure) and the final color of the surface. To account for this in our completely deferred renderer, the environment maps on reflective objects take roughness into account, and selects a specific mip-level in the environment map based on the roughness. The awesome tool (https://code.google.com/p/cubemapgen/) can take an ordinary cube map and generate mips where each mip is a BRDF approximation (actually there are several different algorithms, but for the sake of clarity we’ll stick to Blinn-Phong BRDFs). We can also say to generate a new mip using a glossness falloff, resulting in a very good-looking mip-chain for our cube maps.

You have have come across this image http://seblagarde.files.wordpress.com/2011/07/reference_top_ref_bottom_mipchain.jpg showing a series of cube maps with different levels of reflectiveness, which is exactly what we are doing and what we want. Just to clarify, this is all precomputed using an original cube map and is not done in real-time! The more interesting part is that what is visible in your environment cube map is irrelevant. What is relevant is that the average color of the cube map fits your scene in terms of colors and lighting. In the pictures below, we have the same model ranging in roughness from 0-1.

Roughness set to 0.0

Roughness set to 0.0

Roughness set to 0.5

Roughness set to 0.5

Roughness set to 1.0

Roughness set to 1.0

As you can see, the roughness changes the surface look of the object dramatically, although still uses the exact same shader. Also note that using the skydome cube map as the reflective cube map is a bit ugly since it’s half bright half dark. That’s all!

 

// Gustav

OpenGL

I’ve been working hard on an OpenGL renderer. The main reason is that we want to be able to move away from DirectX and Windows, for oh so many reasons. However, one of the major problems is the handling of shaders. With DirectX, we can use the quite flexible FX framework, which lets us encapsulate shaders, render states and variables into a very neat package. From a content management perspective, this is extremely flexible since render states can be implemented as a separate subsystem. This is the reason why I’ve been developing the FX framework I’ve been talking about.

Well, it works, and a result we now have a functioning OpenGL renderer. The only downside is that it’s extremely slower than the DirectX renderer. I’m currently investigating if this is driver related, shader related or stupidly implemented in Nebula. However, it’s identical in any other aspect, meaning we’ve crossed one of the biggest thresholds with getting a working version in Linux.

These are the results:

DirectX 11.0 version, 60 fps flat.

DirectX 11.0 version, ~60 FPS.

OpenGL 4.0 version, 20 fps flat.

OpenGL 4.0 version, ~20 FPS.

It also just dawned on me that the “Toggle Benchmark with F3!” is not showing on the DirectX 11 version. Gotta look into that…

Anyway, this seriously got me thinking what actually demanded time. I made some improvements in the AnyFX API and reduced the number of GL calls from 63k per frame down to 16k, but it made no difference in FPS. What you see here is approximately 2500 models which are individually drawn (no instancing). The DirectX renderer only updates its constant buffers and renders using the Effects11 framework as backend for shader handling. The OpenGL renderer does the exact same thing, only updates vital uniforms and uniform buffers and renders. As a matter of fact, I used apitrace to watch what was taking so much time. What I observed was that each draw call took about 20 microseconds, which multiplied with 2500 results in 0.05 seconds per frame which amounts to 20 fps. The methods for updating the buffer takes only a fraction of the time, however the GPU waits an ENORMOUS amount of time before even starting the rendering process, as can be seen in this picture.

glprofile

The blue lines describe the CPU load, the width of the line or section determines the time it takes. We can see that each call costs some CPU time. However, we can also see that we start rendering way earlier than the actual GPU starts to get some work done. We can also very easily observe that GPU isn’t busy with anything, so there is no apparent reason (as far as I can tell) as to why it doesn’t start immediately.

Crazy. We’ll see if the performance is as low in Linux as it is in Windows. Have in mind that this is done on an ATI card. On the Nvidia-card I have access to, I got ~40 FPS, but even so, the performance waved between ~20 FPS and ~40 FPS seemingly at random. Weird. It can have something to do with SLI, but I’m not competent to say.

// Gustav

Tone mapping

While I’ve been working on AnyFX, I’ve also looked into tone mapping. Now, Nebula encodes and decodes pseudo-hdr by just down-scaling and up-scaling, something which works rather well in most cases. However, HDR was applied screen-wide and had no adaptation of brightness or color, and this is exactly what tone mapping solves.

I was a bit stunned of how easy it was to perform tone mapping. We need to downscale the color buffer down to a 2×2 image, and then calculate the average luminance value into a 1×1 texture. We then simply copy the 1×1 texture to be used the next frame.

One way to perform the downscaling could be a sequence of post effects which performs a simple 2×2 box average. However, since what we are doing is essentially mipmapping, we could just aswell generate mips instead. The only downside to this is that we generate one more level than we want, but it’s still much more efficient than having to run a series of consecutive downscale passes.

When the average luminance has been calculated, we just use that value with a tone mapping operator to perform the effect. We perform the eye adaptation part when we perform the 2×2 -> 1×1 downscale. The HLSL code for the operator is:

static const float g_fMiddleGrey = 0.6f;
static const float g_fMaxLuminance = 16.0f;</pre>
//------------------------------------------------------------------------------
/**
	Calculates HDR tone mapping
*/
float4
ToneMap(float4 vColor, float lumAvg, float4 luminance)
{
	// Calculate the luminance of the current pixel
	float fLumPixel = dot(vColor.rgb, luminance);
	// Apply the modified operator (Eq. 4)
	float fLumScaled = (fLumPixel * g_fMiddleGrey) / lumAvg;
	float fLumCompressed = (fLumScaled * (1 + (fLumScaled / (g_fMaxLuminance * g_fMaxLuminance)))) / (1 + fLumScaled);
	return float4(fLumCompressed * vColor.rgb, vColor.a);
}

We use a constant for the middle gray area and the maximum amount of luminance. This could be parametrized, but it’s not really necessary. We calculate the average luminance using the following kernel:

//------------------------------------------------------------------------------
/**
	Performs a 2x2 kernel downscale
*/
void
psMain(float4 Position : SV_POSITION0,
	float2 UV : TEXCOORD0,
	out float result : SV_TARGET0)
{
	float2 pixelSize = GetPixelSize(ColorSource);
	float fAvg = 0.0f;

	// source should be a 512x512 texture, so we sample the 8'th mip of the texture
	float sample1 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(0.5f, 0.5f) * pixelSize, 8), Luminance);
	float sample2 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(0.5f, -0.5f) * pixelSize, 8),  Luminance);
	float sample3 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(-0.5f, 0.5f) * pixelSize, 8), Luminance);
	float sample4 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(-0.5f, -0.5f) * pixelSize, 8), Luminance);
	fAvg = (sample1+sample2+sample3+sample4) * 0.25f;

	float fAdaptedLum = PreviousLum.Sample(DefaultSampler, float2(0.5f, 0.5f));
	result = clamp(fAvg + (fAdaptedLum - fAvg) * ( 1 - pow( 0.98f, 30 * TimeDiff) ), 0.3, 1.0f);
}

The 0.98f can be adjusted to modify the speed of which the eye adaptation occurs. We can also adjust the factor with which we multiply the TimeDiff variable. Here we use 30, but we could use any value. Modifying either value will affect the speed of the adaptation.

Also, in order to make the pipeline more streamlined, we first downsample the color buffer (which has a screen-relative size) down to 512×512 before we perform the mipmap generation. This ensures us there will be a level with 2×2, and using simple math, we can calculate the mip level must be the 8th.

512×512 Level 0
256×256 Level 1
128×128 Level 2
64×64 Level 3
32×32 Level 4
16×16 Level 5
8×8 Level 6
4×4 Level 7
2×2 Level 8

So to conclude. First we perform a downscale from variable size down to 512×512. Then we perform a mipmap generation on the render target. Calculate the average luminance and use the time difference to blend between different levels of luminance. We then use the luminance value and perform the operator described above. In order for this to be properly handled, we blend both the bloom and the final result. If we perform bloom without the tonemapping, we will get an overabundance of bloom. The difference can be seen below:

Full tonemapping
Full tonemapping
Blur tonemapping disabled
Blur tonemapping disabled
Final tonemapping disabled
Final tonemapping disabled

// Gustav

Release soon.

The time has finally come to working on the public release of our Nebula3 version. In the true spirit of when Andre still worked at Radonlabs we will release our work on N3 with the same 2 clause BSD license. We were thinking of integrating some of the things we are working on first, but if you continue on that road you will never release anything, so we decided to go ahead and just release and keep working after that. There are still some things in the pipeline, as a fully working OpenGL4 port (using AnyFX that Gustav is working on), a fully working Havoc integration (mostly done) and a rewrite of the network layer.

Currently most of the work left is cleaning up random code, adding proper copyright/license stuff, revamping the build system a bit so that it is a bit more newcomer friendly and above all create some nice demos with content we have created here. Should be done by next week hopefully, so be ready!

And in other news, glad midsommar! ;D

AnyFX progress

Designing a programming language, even a simple one as AnyFX is hard work. It’s hard work because there are so many little things you miss during initial development and planning. Anyways, here is the progress with AnyFX so far. This video shows a shader implemented in AnyFX for OpenGL, and uses vertex, hull, domain, geometry and pixel shading. It uses hull and domain shaders to tessellated and displace the cube into a sphere. The geometry shader is used as a single-pass wireframe renderer, as described and implemented here: http://prideout.net/blog/?p=48. In the video you can see how I dynamically change tessellation inner and outer factors using an AnyFX variable.

anyfx

The next step on the list is compute shaders. When they work properly and I’m satisfied with how they are handled, I’m going to start integrating this into Nebula.

AnyFX, what the fuzz?

As a part of my studies, I’ve been developing a very simple programming language, very similar to that of Microsoft FX for effects. The difference between AnyFX and Microsoft FX is that AnyFX is generic, meaning it will work for any back-end implementation. The language works by supplying all the other stuff BESIDES the code which we need to render. This means that we actually put back-end specific implementations in the shader bodies. Why you may ask? Well, it may be extremely dangerous and poorly optimized if we are to define our own language for intrinsics, function calling conventions etc, and directly translate this to graphics assembler. Instead, we rely on the vendor-specific back-end compilers to do the heavy work for us. As such, we can super easily port our old HLSL/FX shaders or GLSL shaders by simply copying all of the functionality in the function bodies straight into an AnyFX file. However, this requires us to provide potentially several files in order to have support for different shader libraries, and yes, in this sense you are correct. We could implement a language with several function bodies, one for each implementation, but it wouldn’t look like C anymore, and the code could get messy in a hurry. Sounds confusing? Well, here’s an example:

//——————————————————————————
// demo.fx
// (C) 2013 Gustav Sterbrant
//——————————————————————————

// This is an example file to be used with the AnyFX parser and API.
profile = glsl4;

// A couple of example variable declarations
sampler2D DiffuseTexture;
sampler2D NormalTexture;

state OpaqueState;
state AlphaState
{
DepthEnabled = true;
BlendEnabled[0] = true;
SrcBlend[0] = One;
DstBlend[0] = One;
};

// a variable block containing a set of variables, this will instantiated only once in the effects system
// this block of variables will be shared by all other .fx files compiled during runtime with the same name and the [shared] qualifier
varblock Transforms
{
mat4 View;
mat4 Projection;
};

mat4 Model;

varblock Material
{
float SpecularIntensity = float(1.0f);
vec4 MaterialColor = vec4(1.0f, 0.0f, 0.0f, 1.0f);
};

//——————————————————————————
/**
Simple vertex shader which transforms basic geometry.

The function header here complies (and has to comply) with the AnyFX standard, although the function code is written in a specific target language.

This language is compliant with GLSL
*/
void
vsStatic(in vec3 position, in vec2 uv, out vec2 UV)
{
gl_Position = Projection * View * Model * vec4(position, 1.0f);
UV = uv;
}

//——————————————————————————
/**
Simple pixel shader which writes normals and diffuse colors.

Here, we use multiple render targeting using input/output attributes.

We also apply a function attribute which tells OpenGL to perform early depth testing
*/
[earlydepth]
void
psStatic([color0] out vec4 Color)
{
Color = texture(DiffuseTexture, uv);
}

//——————————————————————————
/**
Two programs, they share shaders but not render states, and also provide an API-available data field.
*/
program Solid [ string Mask = “Static”; ]
{
vs = vsStatic();
ps = psStatic();
state = OpaqueState;
};

program Alpha [ string Mask = “Alpha”; ]
{
vs = vsStatic();
ps = psStatic();
state = AlphaState;
};

 

So, what’s fancy here? Well first of all, we can define variables for several shader programs (yay!). The programs combines vertex shaders, pixel shaders, eventual hull-domain and geometry shaders, together with a render state. A render state defines everything required to prepare the graphics card for rendering, it includes depth-testing, blending, multisampling, alpha-to-coverage, stencil testing etc. Basically, for you DX folks out there, this is a combined Rasterizer, DepthStencil and BlendState into one simple object. You may notice that we write all the variable types with the GLSL type names. However, we could just as well do this using float1-4, matrix1-4×1-4, i.e. the HLSL style. The compiler will treat them equally. You may also notice the ‘profile = glsl4’ which just tells the compiler to generate GLSL code as the target. By generate code in this case, I mean the vertex input methodology (which is different between most implementations). It’s also used to transform the [earlydepth] qualifier to the appropriate GLSL counterpart. We can also define variable blocks, called ‘varblock’, which handles groups of variables as buffers. In OpenGL this is known as a Uniform Buffer Object, and in DirectX it’s a Constant Buffer. We also have fancy annotations, which allows us to insert meta-data straight into our objects of interest. We can for example insert strings telling what type of UI-handle we want for a specific variable, or a feature mask for our programs, etc. Since textures are very very special, in both GLSL and HLSL, we define a combined object, called sampler2D. We can also define samplers, which is handled by DirectX as shader code defined objects, and in OpenGL as CPU-side settings. In GLSL we don’t need to define sampling from a texture using both a texture and a sampler, but in HLSL4+ we do, so in that case, the generated code will quite simply put the sampler object in the code. We can also define qualifiers for variables, such as [color0] as you see in the pixel shader, which means that the output will be to the 0’th render target. AnyFX currently supports a plethora of qualifiers, but only one qualifier per input/output.

Anyways, to use this, we simply do this:

 

this->effect = AnyFX::EffectFactory::Instance()->CreateEffectFromFile(“compiled”);
this->opaqueProgram = this->effect->GetProgramByName(“Solid”);
this->alphaProgram = this->effect->GetProgramByName(“Alpha”);
this->viewVar = this->effect->GetVariableByName(“View”);
this->projVar = this->effect->GetVariableByName(“Projection”);
this->modelVar = this->effect->GetVariableByName(“Model”);
this->matVar = this->effect->GetVariableByName(“MaterialColor”);
this->specVar = this->effect->GetVariableByName(“SpecularIntensity”);
this->texVar = this->effect->GetVariableByName(“DiffuseTexture”);

 

Then:

 

// this marks the use of AnyFX, first we apply the program, which enables shaders and render states
this->opaqueProgram->Apply();

// then we update our variables, seeing as our variables are global in the API but local internally, we have to perform Apply first
this->viewVar->SetMatrix(&this->view[0][0]);
this->projVar->SetMatrix(&this->projection[0][0]);
this->modelVar->SetMatrix(&this->model[0][0]);
this->matVar->SetFloat4(color);
this->specVar->SetFloat(1.0f);
this->texVar->SetTexture(this->texture);

// finally, we tell AnyFX to commit all changes done to the variables
this->opaqueProgram->Commit();

 

Aaaand render. We have some restrictions however. First, we must run apply on our program before we are allowed to set the variables. This fits nicely into many game engines, since we first apply all of our shader settings, then apply our per-object variables, and lastly render. We also run the Commit command, which updates all variable buffers in a batched manner. This way, we don’t need to update the variable block for each variable, seeing as this might seriously stress the memory bandwidth. When all of this is said and done, we can perform the rendering. We need to perform Apply first, because each variable will have different binding points in the shaders. In OpenGL, each uniform have a location in a program, and since different programs may use any subset of all variables declared, the locations are likely to be different. In HLSL4+, we use constant buffers for everything. For HLSL4+, commit is vital since if we only use constant buffers, we need to, at some point, update them.

All in all, the language allows us to extend functionality to compile-time stuff. For OpenGL, we can perform compile-time linking by simply testing if our shaders will link together. We can also obfuscate the GLSL code, so that nobody can simply read the raw shader code and manipulate it to cheat. However, during startup, we still need to compile the actual shaders before we can perform any rendering. In the newer versions of OpenGL, we can pre-compile program binaries, and then later load them in the runtime. This could easily be implemented straight into AnyFX if needed, but I’d rather have the shaders compiled by my graphics card so that the vendor driver can perform its specific optimizations. Microsoft seems to be discontinuing FX (for some reason unknown), but the system is still really clever and useful.

And also, as you may or may not have figured out, this is the first step I will take to finish the OpenGL4 render module.

When I’m done with everything, and it’s integrated and proven to work using Nebula, I will write down a full spec of the language grammar, qualifiers and release it open source.

Summer time!

So the game projects are over, phew! The critique we got this year was far less than it was before, which is good since we spent a good lot of time working on the tools. Of course we had some bugs during the development phase, and had some problems which we want to address. Apart from that, it’s time to look forward again to see what we want to change and improve. My area of interest, as you may know by now, is rendering and shading, so this is the list of things I want to fix at the moment:

  • Materials currently need to be inserted in the materials list, and also EVERYWHERE when we want the material to be rendered. It would be much neater if the material could have some attribute for which we sort and render, so that we may say “Render all materials which requires lighting here”, “Render all materials which requires subsurface scattering here” etc etc. This makes it super simple to add a new material into Nebula.
  • Rendering is threaded, but seems to be somewhat glitched since we get random deadlocks. It’s also very cumbersome to get animation stuff, attachments, shader variables, skeleton stuff etc since we have to perform a thread sync every time we request something. So, I would like to rethink the threading part, so that we may use the power of multithreaded rendering, while at the same time maintain simple access to graphics data. My current idea is to implement a separate context which allocates resources (since texture, mesh and shader allocation is going to take time), while still maintaining the immediate render context on the main thread. Animation and visibility queries should still be in jobs, since it works extremely smoothly and fast. This also simplifies Qt applications since the WinProc will be in the main thread! Also, if possible, we could have another thread which only performs draw calls, so that the actual drawing doesn’t directly affect game performance. This last part is similar to: http://flohofwoe.blogspot.se/2012/12/coregraphics2.html
  • Changing materials on models is quite a hustle, and it shouldn’t since the material is only used to batch render objects, so we should be able to switch materials, which is then applied on the next render.
  • All models should be flat in hierarchy. This is because we want a model to be split by it’s materials, meaning we can easily find and set a variable for a ‘node’ in a model. Previously, we had to traverse the node hierarchy for the model to find a node, but this shouldn’t have to be the case since we can flatten transforms and pre-multiply all transforms directly to the mesh. As such, we can remove a lot of complexity with model updates by simply having a combined transform for all nodes, let all nodes be primitive groups in a mesh and let each node have its own material. This still allows us to have the super clever models used in Nebula, where we can have several materials for the same model.
  • Dynamic meshes. If we are going to have a deferred rendering context which allocates resources, then we should be able to load static meshes in the thread, and create dynamic meshes in our main thread, then we can also modify the dynamic meshes however we want. As such, we could super simply implement cloth, blend shapes, destruction etc.
  • We should be able to import statically animated meshes. A very clever way to do this is to use all the animated transform nodes and convert them to a joint hierarchy, and then to simply rigid bind the meshes to their joints. This is very similar to how rigid binding is done at the moment, but in this case the artists need not create a skeleton before hand (creds for this goes to Samuel Lundsten, our casual graphics artist).
  • Model entities are a bit too generic. This is very flexible with the current multithreaded solution, since we can simply send a GraphicsEntityMessage to the server side ModelEntity and everything works out fine. However, whenever we need a specialized model, for example a particle effect, a character (, and in the future cloth/destructable model) we are likely to have specialized functionality, for example particle effect start/stop, character animaton play and so on, you get the idea. It’s a bit more intuitive if we have different types of entities for this instead of using generic model entities. Also, in my opinion, the way a character is ALWAYS a part of a model entity, be it initialized or not, which doesn’t look that nice. Another problem is that since particle models, just like any models, have a simple model hierarchy, we need to traverse all particle nodes in order to start/stop the particle effect. This can then be way more generalized if we have a ParticleModelEntity which handles this for us.
  • OpenGL renderer. I don’t really like the DirectX API, in any shape or form. It’s only valid to use for Windows and Xbox, and for Windows we can still use OpenGL anyways, so there is really no reason unless you want to develop games exclusively for Xbox. Besides, the current gen graphics is only available on Xbox One, so the only reason we would want to keep using DirectX would be to develop games to a set-top box. We will of course till have the DirectX 11 renderer available if we want. Right, OpenGL! I’m almost done with AnyFX right now, and currently the only back-end implementation available is for GL4 (coincidence? I think not!). The shading was the major cog in our wheels, since we had no FX counterpart for OpenGL, which I hope AnyFX will supply.
  • Integrate AnyFX. Whenever I’m done with AnyFX, meaning it should be thoroughly tested with geometry shading, hull and domain shading, samplers and perhaps even compute shaders, I’m going to integrate it with Nebula in order to accomplish the goal defined above.
  • Compute shading. Whoa, this would be soooo awesome to integrate. With this, we could have GPGPU particles, or perhaps a Forward+ renderer (drool).
  • Frame shader improvement. It could prove useful to be able to declare depth-stencil buffers separately. This isn’t really THAT important, but it could prove useful to render some stuff to a separate depth buffer without having requiring it to be paired with the actual render targets.
  • Implement some system which lets us use two sources for shaders. This is useful when having an SDK, with all the useful shaders from the engine, and then be able to import new shaders from an auxiliary source, your project for example. The same goes for materials, and materials should perhaps be defined as a resource where each material defines their shaders, variations for each shader and parameters for each material. As such, we can simply just add a new material. Currently, the materials are listed in one big XML file, and this is not so neat, seeing as we might have tons of different materials, and finding a certain material takes a considerable amount of time since just the current materials take up tons of space. With this implemented, we can also split materials into SDK materials and project materials. As for frame shaders, we specify a very specific render path, and in my opinion, this should be exclusive for each project.
  • Enabling and disabling rendering effects is neither pretty, nor working properly. The frame shaders define a complete render path, and each frame shader has mutually exclusive set of resources. So if we want to implement some AO algorithm, we have to put it straight in the main frame shader, and this is both ugly and perhaps unwanted (when considering different specs). So, if we want to have for example low-medium-high AO quality, we have to define three different frame shaders, each with their own variations. This is perhaps not so nice. While I love the frame shader render path system, I dislike the inflexibility to control it during runtime. A way to handle this would be to define where during the render path we want to perform some render algorithm by supplying the render path with an algorithm handler, for example:

    <Algorithm handler=”Lighting::AOServer” output=”AOBuffer”>
    <Texture name=”Depth” value=”DepthBuffer”/>
    </Algorithm>

    The render system would then call the Lighting::AOServer::Instance() with a set of predefined functions, which is used to prepare the rendering of the algorithm, which is then written to the output. As such, we can simply call the AOServer and adjust our AO settings accordingly. We could also have batches within the Algorithm tag, which enables us to render geometry with a specific adjustable server handling it. The Texture tag defines an input to the handler, which works by supplying a symbolic name (“Depth” in this case) to which we bind a texture resource (Here it’s the render target “DepthBuffer”). Since the frame shader only deals with textures, this is the only thing we have to define in order to run our algorithm.

  • Visibility is resolved a little goofy currently. This is because visibility has to be resolved after we perform OnRenderBefore() on our models, which in turn is called whenever we prepare for culling. This is a bit weird, since OnRenderBefore is REQUIRED in order for a model to use a transform, but OnRenderBefore only gets called if an object is visible. This means that when an object gets created, it’s automatically in origo until OnRenderBefore gets called. However, since this only gets called when the camera sees an object, it means that our shadow casters never gets this feedback, and as such, they stay untransformed until they are visible by the camera! Have in mind that the global bounding box gets updated constantly, we don’t have to turn the camera to origo just to get objects to be visible, however we have to see them with the actual camera! Albeit, this is very clever since it means objects outside the visible area never gets their transforms updated, and this is completely legit if we don’t use shadows. However, when we do have shadows, we need to update models for all objects visible by the camera and all shadow casting lights! The current solution is a bit hacky; it works but it can be done much nicer.
  • Point light shadows. As of right now, point lights cannot properly cast shadows. I should implement a cube map shadow rendered which utilizes geometry shading to render to a cube with one draw pass. This also means that point light shadows cannot be baked into the big local light shadow buffer, but need instead their own buffers.

Phew, quite a list! In the next update I’ll probably be posting something about more about AnyFX, since it will be the first thing I check off the list.

 

// Gustav

Water 2.0

I got a request some time ago to make a video showing the water. So I did!

http://vimeo.com/66406007

I also took the time to make some additions which we got as requests from the students. The first was to be able to have depth fog, so whenever something is really far away from the water plane, it gets fogged, so as to simulate really deep water.

I also added, per request, a foam effect. The foam works just like the fog, based on a distance and the per-pixel difference in depth, apply either the foam texture or the fog color. The foam is showed in the video as the white outline around the house, which helps with getting a more realistic interaction between the water and other solid objects.

You can also see that the objects beneath the water become more opaque depending on how close to the water surface they are. This also gives it a more realistic effect, since it looks like the water thins out and thus obscures the underlying object less until it hits the water surface.

Currently, the water has two color settings, one for the water color and one for the fog color. The fog color applies to any geometry beneath the water distance based on a distance factor. This is also visible in the video when you see the big island patch being submerged more and more, it turns purple because of the fog color purple, while the water surface however, remains blue.

Oh, and you can also see the fake reflections. I should really implement a way to get real reflections instead of having that screen-space cheat…

 

Current status

Since the game projects started, we haven’t quite had the time to work on the tech or to post updates here, but let me instead give you a brief overview of what’s going on.

Obviously, we’ve been finding bugs and fixing them, nothing to fancy or weird, except for one special thing. It so happens, on some computers, that we get thread deadlocks. We have yet to understand why this happens, since Nebula doesn’t have cross-depending threads in any way. This seems to happen whenever we start to animate characters, however, it doesn’t happen if we just animate characters. Here is where the fun comes in. I’ve setup a test environment with a computer identical to the ones getting this problem. I tested this by just creating a render application which only rendered characters. I thought it might be some kind of performance problem, so I just put tons of characters in the scene and pushed it down to 2-8 fps. It never happened. There seem to be something else amiss here, and even though we’ve found a solution, we’re still uncertain how good it is.

The solution was to, quite simply, remove the render thread. This doesn’t mean we remove the rendering, but just put the render system on the main thread. We hesitate to do this since there must be a reason for the render thread in the first place, be it for the PS3, or just to handle multi-core PCs better. One theory we have to why this is a problem might be that the window handle and the window event loop is actually on another thread than the actual application, something which caused random deadlocks before. However, in the case with the previous WinProc problem, the thread actually hanged inside the procedure instead of somewhere else.

This got me thinking, maybe it is time to redesign the render system, not because it’s wrong or poorly implemented, but because we could benefit from another approach to render with multiple threads, instead of having a render system back end which is enormous and complex. Instead, we could make good use of modern graphics hardware to implement several render devices, and move the parallelism to only perform draw calls. Note that this will not be possible to utilize in older hardware, but seeing as we are 100% focused on the PC platform, this shouldn’t be a problem for us. This is similar to what Andre Weissflog has already implemented in the official Bigpoint branch of Nebula.

We might also want to rethink how models work. The system today with model hierarchies is really nice, and provides a very accurate 1:1 representation of the Maya content into Nebula. The only problem is its performance. If we have a complex model with a huge hierarchy, the update time per model can be murder for the cache, since it has to recursively traverse each model node and perform updates. Instead, it would be much neater to just have a list of model nodes, which can be processed in a much more data-driven manner. Having said that, the majority of the render system works nicely, is easily maintainable, and should be easy to implement into this new design idea.

We’ve also been considering redesigning application to have it be slightly less flexible but more data-driven in design. We’re a bit suspicious that game applications run so much slower than render applications. Obviously this would be the case since we have lots of game mechanics to compute and handle, but the application system doesn’t quite scale well with lots of entities. Both redesign considerations are of course projects for the future, and will not be started with in the near future.

What I am currently working on, instead of fixing and finding bugs in the tools our game groups and internship guys are using, is a way to tie Nebula, and any other game engine for that manner, to make effects for shaders. Much like the Microsoft Effects system, effects provide a really clean and flexible way to create shader pipelines by combining shader programs, and render states into a manageable construct which can be written in a file, compiled, read in your game engine and then used with a very simple interface. The only problem is that up to this point in time, nothing with the same level of functionality has yet been provided for OpenGL.

I give to you AnyFX. Well, I don’t really, because it’s not done yet, but I can at least give you the concept.

AnyFX strives to implement a language which looks like C, and thus also resembles HLSL and GLSL, but also handles shader linkage, render states, and variable handling. At the same time, I also wanted a simple way of porting shaders to use with AnyFX, as well as maintain the same level of optimization and language-specific intrinsics without having to write an extremely complex compiler, since this is way above my level of knowledge and not within a reasonable time frame. AnyFX works by writing all wrapping code in the AnyFX language standard, although all code handled inside the function scope is target language native. To avoid further confusion, variables can be declared as either GLSL or HLSL types, although identical data types will internally be represented the same. AnyFX comprises of two parts, a compiler, and an API. An AnyFX file has to be compiled into a binary blob, during which it gets evaluated for cross shader linkage, semantic and static analysis. The compiler then outputs target language, which only covers variable declarations and function headers, nothing more, nothing less.

The API can load a binary compiled AnyFX file, at which point it loads all structures, such as render states, sampler states and programs. The back end is fed this information, and is implemented depending on the target platform. So for example, if I build an AnyFX file with profile glsl4, the back end will implement an GLSL4Effect, or if I build an AnyFX file with the hlsl5 profile, I get an HLSL5Effect. The front end can simply call Apply to apply the effect, meaning it will set the render state, and Commit, which basically just updates constant variable buffers if present. Just like it works currently in Nebula, we can simply integrate AnyFX to replace both Effects11, but more importantly, have an actual effects interface for OpenGL.

This time you get no videos or pictures I’m afraid, because to be honest, there is nothing new and cool to show off.