Graphics – New design philosophy

DOD vs OOP

Data oriented design has been the new thing ever since it was rediscovered, and for good reason. The funny part is that in practice it is a regression in technology, back to the good old days of C, although the motivations may be different. So here is the main difference between OOP and DOD:

With OOP, an object is a singular instance of its data and methods. As OOP classes get more members, its size increases, and with it the stride between consecutive elements. In addition, an OOP solution has a tendency to allocate an instance of an object when it is required. OOP is somewhat intuitive to many modern programmers, because it attempts to explain the code in clear-text.

The DOD way is very different. It’s still okay to have classes and members, although care should be taken as to how those members are used. For example, if some code only requires members A and B, then it’s bad for the cache if there are members C and D between each element. So how do we still use an object-like mentality? Say we have a class A, and should use members a, b, and c. Instead of treating A as individual objects, we have a new class AHub, which is the manager of all the A instances. The AHub contains a, b and c as individual arrays. So how do we identify individual A objects? Well, they become an index into those arrays, and since those arrays are uniform in length, each index becomes a slice. Now it’s fine if for example a is another class or struct. There are many benefits to a design of this nature:

1. Objects become integers. This is nice because there is no need to include anything to handle an integer, and the implementation can easily be obfuscated in a shared library.
2. No need to keep track of pointers and their usage. When an ID is released nothing is really deleted, instead the ID is just recycled. However there are ways check if an ID is valid.
3. Ownership of objects is super-clear. The hub classes will indiscriminately be the ONLY class responsible for creating and releasing IDs.

An example of how different the code can be, here is an example:

// OOP method
Texture* tex = new Texture();
tex->SetWidth(...);
tex->SetHeight(...);
tex->SetPixelFormat(...);
tex->Setup();
tex->Bind();

// DOD method
Id tex = Graphics::CreateTexture(width, height, format);
Graphics::BindTexture(tex);

Now one of the things which may be slightly less comfortable is where you put all those functions?! Since all you are playing with are IDs, there are no member functions to call. So where is the function interface? Well, the hub of course! And since the hubs are interfaces to your objects, the hubs themselves are also responsible for the operations. And since the hubs are interfaces, they can just as well be singleton instances. And because you can make the hubs singletons, it also means you can have functions declared in the class namespace which belong to no class at all. So in the above example, we have the row:

Id tex = Graphics::CreateTexture(width, height, format);

, but since textures are managed in some singleton, what Graphics::CreateTexture is really doing is this:

namespace Graphics
{
Id CreateTexture(int width, int height, PixelFormat format)
{
    return TextureHub::CreateTexture(width, height, format);
}
} // namespace Graphics

Now the benefits is that all functions can go into the same namespace, Graphics, in this case, and the programmer does not need to keep track of whatever the the hub is called.

Resources

In Nebula, textures are treated as resources, and they go through a different system to be created, however the process is exactly the same. Textures are managed by a ResourcePool, like all resources, and the ResourcePools are also responsible for implementing the behavior of those resources. With this new system, smart pointers are not really needed that much, but one of the few cases where they are still in play is for those pools. The resources have a main hub, called the ResourceManager, and it contains a list of pools (which are also responsible for loading and saving). There are two families of pools, stream pool sand memory pools. Stream pools can act asynchronously, and fetches its data from some URI, for example a file. Memory pools are always immediate, and take their information from data already in memory.

Textures for example, can be either a file, like a .dds asset, or it can be a buffer mapped and loaded by some other system, like LibRocket. Memory pools have a specific set of functions to create a resource, and they are ReserveResource which creates a new empty resource and returns the Id, and UpdateResource which takes a pointer to some update structure which is then used to update the data.

The way a resource is created is through a call to the ResourceManager, which gets formatted like so:

Resources::ResourceId id = ResourceManager::Instance()->ReserveResource(reuse_name, tag, MemoryVertexBufferPool::RTTI);
struct VboUpdateInfo info = {...};
ResourceManager::Instance()->UpdateResource(id, &info);

reuse_name is a global resource Id which ensures that consecutive calls to ReserveResource will return the same Id. tag is a global tag, which will delete all resources under the same tag if DiscardByTag is called on that pool. The last argument is the type of pool which is supposed to reserve this resource. In order to make this easier for the programmer, we can create a function within the CoreGraphics namespace as such:

namespace CoreGraphics
{
Resources::ResourceId 
CreateVertexBuffer(reuse_name, tag, numVerts, vertexComponents, dataPtr, dataPtrSize)
{
    Resources::ResourceId id = ResourceManager::Instance()->ReserveResource(reuse_name, tag, MemoryVertexBufferPool::RTTI);
    struct VboUpdateInfo info = {...};
    ResourceManager::Instance()->UpdateResource(id, &info);
}
} // namespace CoreGraphics

ReserveResource has to go through and find the MemoryVertexBufferPool first, so we can eliminate that too by just saving the pointer to the MemoryVertexBufferPool somewhere, perhaps in the same header. This is completely safe since the list of pools must initialized first, so their indices are already fixed.

Now all we have to do to get our functions is to include the CoreGraphics header, and we are all set! No need to know about nasty class names, everything is just in there, like a nice simple facade. Extending it is super easy, just declare the same namespace in some other file, and add new functions! Since we are always dealing with singletons and static hubs, none of this should be too complicated. It’s back to functions again! Now we can chose to have those functions declared in some header for each use, for example all texture-related functions could be in the texture.h header, or they could be exposed in a single include. Haven’t decided yet.

One of the big benefits is that while it’s quite complicated to expose a class to for example a scripting interface, exposing a header with simple functions is very simple. And since everything is handles, the user never has to know about the implementation, and is only exposed to the functions which they might want.

Handles

So I mentioned everything is returned as a handle, but a handle can contain much more information than just an integer. The resources is one such example, it contains the following:

1. First (leftmost) 32 bits is the unique id of the resource instance within the loader.
2. Next 24 bits is the resource id as specified by reuse_name for memory pools, or the path to the file for stream pools.
3. Last 8 bits is the id of the pool itself within the ResourceManager. This allows an Id to be immediately recognized as belonging to a certain pool, and the pool can be retrieved directly if required.

This system will keep an intrinsic track of the usage count, since the amount of times a resource is used is indicated by the unique resource instance number, and once all of those ids are returned, the resource itself is safe to discard.

All the graphics side objects are also handles. If we for example want to bind a vertex buffer to slot 0 with offset 0, we do this:

CoreGraphics::BindVertexBuffer(id, 0, 0);

Super simple, and that function will fetch the required information from id, and send it to the render device. While this all looks good here, there is still tons of work left to do in order to convert everything.

Tool updates

Apart from finally releasing the core engine on github we have of course been busy with the tools, working on streamlining the ui and making it more usable in general. Both major editors have received facelifts like dockable widgets and attribute editors that consume less space. The content browser has received support for browsing (and playing) sound, previewing ui layouts, resource browsers for textures.

cb
The leveleditor has support for layers, selecting what ui layouts and sounds to preload, game entities can be transformed to other entity classes and environment entities and vice versa. On top of all that it is possible to drag and drop items between the two programs.

LE

Nebula trifid now on Github

After messing around for all too long we finally moved our main development to github . It’s still rough around the edges and probably close impossible to build without knowing what you are doing, but we are working on that.
The demo content will follow suite soon as well so that it is easier to get started in general.

Physically based lighting revisited

I’ve been working closely with one of our graphics artists to iron out the faults with the PBR rendering. We also thought it would be a good idea to also include a way to utilize IBL aswell, since it seems to be the way to go when using modern lighting pipelines. Last time I just gathered my code from: http://www.massimpressionsprojects.com/dev/altdevblog/2011/08/23/shader-code-for-physically-based-lighting/ which more or less describes how to implement PBR in realtime, so go there if you wish you to implement PBR in your engine. Instead, I thought I should explain how to combine the PBR with the newly implemented IBL techniques we use!

So to start of, for rendering with IBL you need two textures, or cube maps to be more precise. The first is an ordinary environment map, and is used for reflections. The other is called an irradiance map, and describes the light being radiated from the surrounding environment; an irradiance map can be generated from an environment map using for example ‘cubemapgen’ (https://code.google.com/p/cubemapgen/). The irradiance map is sampled differently from the environment map, whereas the reflections should be just that, a reflection on the surface, the irradiance is more like diffuse light. So to sample reflections and irradiance, we currently use this code (and I doubt it’s gonna be subject to change):

Geometry

	mat2x3 ret;
	vec4 worldNorm = (InvView * vec4(viewSpaceNormal, 0));
	vec3 reflectVec = reflect(worldViewVec, worldNorm.xyz);
	float x = dot(-viewSpaceNormal, normalize(viewSpacePos.xyz)) * MatFresnelDistance;
	vec3 rim = FresnelSchlickGloss(specularColor.rgb, x, roughness);
	ret[1] = textureLod(EnvironmentMap, reflectVec, (1 - roughness) * EnvNumMips).rgb * rim;
	ret[0] = vec3(0);
	return ret;

I’m using a matrix here because for some reason subroutines cannot have two ‘out’ arguments and subroutines cannot return structs…

You might also notice how we input a view space normal, so we actually transform it into a world space normal first, so that’s a bit expensive but all of the lighting is done in view space so this is probably going to be cheaper than to perform the lighting into view space for each light. So, without further ado let’s dive into what the code does!

	vec3 reflectVec = reflect(worldViewVec, worldNorm.xyz);

Calculate reflection vector by reflecting the view vector with the world normal, this will obviously give us the vector to use when sampling the reflections.

	float x = dot(-viewSpaceNormal, normalize(viewSpacePos.xyz)) * MatFresnelDistance;

This is basically the NV vector we calculate for use in the Fresnel calculation on the row below.

	vec3 rim = FresnelSchlickGloss(specularColor.rgb, x, roughness);

Calculate Fresnel using a modified algorithm which takes into account the roughness of the surface. This function looks like this:

vec3
FresnelSchlickGloss(vec3 spec, float dotprod, float roughness)
{
	float base = 1.0 - saturate(dotprod);
	float exponent = pow(base, 5);
	return spec + (max(vec3(roughness), spec) - spec) * exponent;
}

By this point we have what we want! So we just sample our textures using the data!

	ret[1] = textureLod(EnvironmentMap, reflectVec, (1 - roughness) * EnvNumMips).rgb * rim;
	ret[0] = textureLod(IrradianceMap, worldNorm.xyz, 0).rgb;

Now we’re done with sampling the reflections. What is left now is to somehow get this into the rendering pipeline for further processing. We use the roughness to select the mipmap, where EnvNumMips denotes the number of mips present in this specific environment map.

This is a typical fragment shader used in Nebula:

shader
void
psUber(in vec3 ViewSpacePos,
	in vec3 Tangent,
	in vec3 Normal,
	in vec3 Binormal,
	in vec2 UV,
	in vec3 WorldViewVec,
	[color0] out vec4 Albedo,
	[color1] out vec4 Normals,
	[color2] out float Depth,	
	[color3] out vec4 Specular,
	[color4] out vec4 Emissive) 
{
	Depth = calcDepth(ViewSpacePos);
	
	vec4 diffColor = texture(DiffuseMap, UV) * vec4(MatAlbedoIntensity, MatAlbedoIntensity, MatAlbedoIntensity, 1);
	float roughness = texture(RoughnessMap, UV).r * MatRoughnessIntensity;
	vec4 emsvColor = texture(EmissiveMap, UV) * MatEmissiveIntensity;
	vec4 specColor = texture(SpecularMap, UV) * MatSpecularIntensity;
	
	vec4 normals = texture(NormalMap, UV);
	vec3 bumpNormal = normalize(calcBump(Tangent, Binormal, Normal, normals));

	mat2x3 env = calcEnv(specColor, bumpNormal, ViewSpacePos, WorldViewVec, roughness);
	Specular = calcSpec(specColor.rgb, roughness);
	Albedo = calcColor(diffColor, vec4(1), AlphaBlendFactor) * (1 - Specular);	
	Emissive = vec4(env[0] * Albedo.rgb + env[1], 1) + emsvColor;
	
	Normals = PackViewSpaceNormal(bumpNormal);
}

Inputs and outputs

Here is another interesting detail. What is actually a specular map when using PBR? In our engine, we define it as just that, reflective color in RGB. To simplify authoring for our graphics artists, all textures can be adjusted using simple scalar values. The same actually goes for the Fresnel effect mentioned earlier, albeit it’s not actually physically correct. Anyways, we also use subroutines so all functions called ‘calcXXX’ is calling some subroutine function. What we are interested in here is env, and what we feed to Emissive and Albedo. We can see that we cheat a bit with the albedo color by using 1 – Specular. This isn’t really energy conserving since it doesn’t take the Fresnel effect into account. To emissive, we simply do this:

	Emissive = vec4(env[0] * Albedo.rgb + env[1], 1) + emsvColor;

This calculation comes from the two previously calculated arguments, env[0] is the diffuse irradiance, and env[1] is the specular reflection. Later in the pipeline, when we have calculated the light we simply add this value to the total color value of said pixel. The next part will cover how lights are calculated using the above data.

Lights

	vec3 viewVec = normalize(ViewSpacePosition);
	vec3 H = normalize(GlobalLightDir.xyz - viewVec);
	float NH = saturate(dot(ViewSpaceNormal, H));
	float NV = saturate(dot(ViewSpaceNormal, -viewVec));
	float HL = saturate(dot(H, GlobalLightDir.xyz));
	vec3 spec;
	BRDFLighting(NH, NL, NV, HL, specPower, specColor.rgb, spec);
	vec3 final = (albedoColor.rgb + spec) * diff;

So yes, this is the good old calculations normally required to do lighting. We calculate the H vector a bit differently than the usual LightDir + ViewVec, and this is because we actually have the view vector in inverse, since viewVec is the vector FROM the camera to the point, so the formula becomes GlobalLightDir + -viewVec. The same must be applied when calculating the N dot V product, since it’s supposed to represent the angle between the normal and the view vector when looked at from the point on the surface. This is important to consider, since without it the Fresnel attenuation will fail and you will get overly strong specular highlights. I was struggling with getting the specular right and it turned out that the solution was simple, so be sure that these dot products are both saturated and, well, correct! The function then outputs the result to an output parameter, in this case we call it spec. diff in the final calculation is the diffuse color of the light. Roughness is converted to specular power, but it’s a rather simple process, we simply do:

float specPower = exp(13 * roughness + 1);

To get it into a range of 2-8192 which allows us to get rather strong specular highlights if the roughness is low. Also note that specular power is only relevant to use when we feed it into the BRDF function, and not before. In the previous instances we actually just use the raw roughness value.

The pros and cons

PBR materials require way more authoring and obviously loads of knowledge from the artists side, albeit the results are loads better since the lighting doesn’t have to be built into the object itself for every scene. Basically, you will need at least 4 texture inputs per object:

Albedo – Color of the surface, or in lighting terms it’s the direct reflected color of a surface. Channels = RGBA (A is used for alpha blending/testing)

Specular/Reflectiveness – Color of the reflectivity, for most materials this is going to be a whiteish hue, but for some metals, for example gold or copper, the reflectivity is a goldish hue. Channels = RGB, each channel represents each colors reflectivity.

Roughness/Glossyness – Value of surface roughness. This is a value going between 0-255 for artists, or in a shader between 0-1 and corresponds to the microsurface density. Depending on how you want it, it can be 1 for glossy, or 1 for rough. Channels = Red only

Normals – Self explanatory.

However, in order to get the values just right, we also provide a set of intensity sliders for each texture, which makes it simpler for an artist to get the values just right by scaling the texture values with a simple multiplication. This ensures that roughness and specularity matches the wanted values.

Why multidraw is inflexible.

I’ve been working more on the performance stuff in Nebula. In the current version, I added a function to AnyFX in which one can determine if a uniform buffer should be synchronized or not. While this may result in corruptions during rendering due to stomping data in transit, it will also give an enormous boost in performance. One easy way to work around the stomping is by increasing the buffer size so that one can render at least one frame before needing to restart writing to the buffer. In the current state, we have all per-object data in a uniform buffer holding 8192 (a massive amount) of sub-buffers. This gives us the ability to draw (!) 8192 draws each frame without destroying the data, which should be sufficient. If one needs to have synchronizing buffers, then one can just remove the ‘nosync’ qualifier from the varblock. Anyway.

In my quest I also checked out glMultiDraw*, which is nice in principle but there are many things that bother me. To start with, using glMultiDraw, we are only really able to do slightly more flexible instanced rendering. Yes, there is the gl_DrawIDARB GLSL shader variable which lets us address variables in a shader based on which one of the multiple draws we have, and yes this is very simple to implement. But! And this is one major but. It is very non-intuitive to put every per-object variable in arrays. While we can define unsized arrays and use shader storage buffers which are awesome, it requires the engine to, in its core, redefine how it sets variables. For example, some variables has to be set indexed in an array, and some may be set as individual uniforms. Well, we can’t really do any individual uniforms anymore, so everything has to be in arrays. This will cause some major confusion since both are syntactically allowed, however their behavior is completely different. To illustrate, this is how most rendering pipelines do in my experience:

for each material
{
  ApplyMaterial()                         // apply material by setting shaders, tessellation patch size, subroutines
  ApplyMaterialConstants()                // shared variables like time, random value and whatnot
  for each visible model using material
  {
    ApplyModel()                          // set vertex buffer, index buffer
    for each visible model instance using material
    {
      ApplyObjectConstants()               // apply per-object variables, model transform, color etc
      Draw()
    }
  }
}

Which allows us to apply some variables for each draw. Now, if we use the multi draw principle, we will instead get this flow which is better in terms of overhead.

for each material
{
  ApplyMaterial()                         // apply material by setting shaders, tessellation patch size, subroutines
  ApplyMaterialConstants()                // shared variables like time, random value and whatnot
  for each visible model using material
  {
    ApplyModel()                          // set vertex buffer, index buffer
    for each visible model instance using material
    {
      ApplyIndexedObjectConstants()               // apply per-object variables, model transform, color etc
      AccumulateDrawData()                       // add draw data to buffer, i.e. base vertex, base index, number of instances etc.
    }
    MultiDraw()
  }
}

Now, if we have a uniform which isn’t in an array (because a developer, graphics artist(!!!) or shader generation tool may have created a variable using the ordinary way) then we will only get the variable from the last call in the list. It might be specially difficult for a shader generation tool to know whether or not to make an ordinary uniform or to group it in a shader block. Even though this is a user error, it still makes more sense to use ordinary variables as before. The way one would implement this in GLSL to be failsafe would be like this:

struct PerObject
{
  mat4 Model;
  mat4 InvModel;
  vec4 Diffuse;
  int ObjectId;
};

layout(std430) buffer PerObjectBuffer
{
  PerObject perObjectData[];
};

Then setting the variables in an indexed fashion. The problem here lies in the fact that the uniforms will not be visible to the user. Instead, we have to reflect this buffer in the engine code so that we have a matching buffer, and then update into that. The above code will not be as simple as setting the value in a variable, which is one of the major downsides. Another way of doing the same thing is like this:

layout(std430) buffer PerObjectBuffer
{
  mat4 Model[];
  mat4 InvModel[];
  vec4 Diffuse[];
  int ObjectId[];
};

If this would be syntactically okay, but it unfortunately isn’t.

Also, in which case one needs to define the variable as an array to not get undefined behavior. Furthermore, making all variables as buffers as structs like in example one also automatically removes the transparency of using a variable to set a value in. Why you may ask? Well, since shader storage buffers can have many members, but only the last may be an unsized array. So we might expose buffer members as variables (just like with uniform blocks) but exposing a member which is a struct would basically only give us a member which is defined as a block of memory. So setting this block of memory from the engine requires we have a struct on the engine side which we send to the GPU. So we also have to serialize the struct types in order to reflect them on the engine side. Summarized, doing this:

variable->SetMatrix(this->modelTransform);

is much simpler to understand than

struct PerObject
{
  matrix44 Model;
  matrix44 InvModel;
  float4 Diffuse;
  int ObjectId;
} foo;
foo.model = this.modelTransform;
variable->SetStruct(&foo);

Why? Well what if we need to set more than one of the members (for example Model, then InvModel at some later stage)? There is no way to apply this to a generic case! The thing we could do is to simulate a struct as a list of variables, by making:

struct PerObject
{
  mat4 Model;
  mat4 InvModel;
  vec4 Diffuse;
  int ObjectId;
};

layout(std430) buffer PerObjectBuffer
{
   int MaterialType;
   PerObject data[];   
};

Appear like this on the client end:

buffer PerObjectBuffer
{
   int MaterialType;
   mat4 Model[];
   mat4 InvModel[];
   vec4 Diffuse[];
   int ObjectId[];
};

By making the indexed variables here appear as if they were buffer members, we could use use a clever device to offset them when we write to them (using SetVariableIndexed), while still having them exposed as ordinary variables. The only thing we have to consider is that the offset of the variable only increases by the size of the struct for each index, with a constant offset for the other members. Confused? Well let’s make an example. Lets say we are going to set Diffuse for index 0, the offset is then int + mat4 + mat4 since we are going to write to the first index. If we set it for index 1, then the offset is int + (mat4 + mat4 + vec4 + int) * 1. The generic formula is (non-indexable members size) + (indexable member size) * index. In this case, the non-indexable members is simply an int, while the indexable is mat4 + mat4 + vec4 + int.

This being said, OpenGL has a lot of features and is flexible and so on. The only problem is that it’s somewhat complicated to wrap if one wants to wrap all of the advanced features. It should also be mentioned that there is a significant difference between uniforms and buffers. Uniforms are unique, meaning there can be only one uniform per program with the same name. Buffer members are not uniforms, but rather structure members, meaning they are not unique. If we have this code for example:

struct PerObject
{
  mat4 Model;
  mat4 InvModel;
  vec4 Diffuse;
  int ObjectId;
};

layout(std430) buffer PerObjectBuffer
{
   int MaterialType;
   PerObject data[];   
};

layout(std430) buffer SomeOtherBuffer
{
   PerObject data[];   
};

Then PerObject will be used twice, meaning Model will appear twice in the program context. This completely ruins our previous attempts at making storage buffers transparent as variables. Conclusion, we need to change how the engine handles variables. It’s probably simpler to just use uniform buffers and have the members as arrays, since that is already implemented and won’t be weird like this. We can then implement shader storage buffers as simple structures to which we can read/write data from/to when using for example compute shaders.

New Domain and Name

As you may have noticed, we have moved the page to a new domain and have started on writing/collecting documentation. We are still working on a release but due to changed priorities it got delayed a bit. I hope to be able to finally wrap up the release in the upcoming month, but this time no promises. One of the main reasons holding everything back is demo content, we don’t have any suitable demo game/project to release along with the SDK, and since current documentation is either a bit outdated or incomplete, it would make it a bit hard to get started if you haven’t worked with any incarnation of Nebula before.

Apart from the new Domain we also decided to changed the name of our fork. The last release of the Nebula3 SDK from Radon Labs and our version have diverged considerably with respect to tools, file formats, and the whole pipeline, that we figured we should use a new name to avoid confusion. Our version will be called Nebula Trifid from now on, keeping the to the space theme. For those that are interested, the trifid nebula is a specific nebula located in the Sagittarius constellation.

//Johannes

 

Tone mapping

While I’ve been working on AnyFX, I’ve also looked into tone mapping. Now, Nebula encodes and decodes pseudo-hdr by just down-scaling and up-scaling, something which works rather well in most cases. However, HDR was applied screen-wide and had no adaptation of brightness or color, and this is exactly what tone mapping solves.

I was a bit stunned of how easy it was to perform tone mapping. We need to downscale the color buffer down to a 2×2 image, and then calculate the average luminance value into a 1×1 texture. We then simply copy the 1×1 texture to be used the next frame.

One way to perform the downscaling could be a sequence of post effects which performs a simple 2×2 box average. However, since what we are doing is essentially mipmapping, we could just aswell generate mips instead. The only downside to this is that we generate one more level than we want, but it’s still much more efficient than having to run a series of consecutive downscale passes.

When the average luminance has been calculated, we just use that value with a tone mapping operator to perform the effect. We perform the eye adaptation part when we perform the 2×2 -> 1×1 downscale. The HLSL code for the operator is:

static const float g_fMiddleGrey = 0.6f;
static const float g_fMaxLuminance = 16.0f;</pre>
//------------------------------------------------------------------------------
/**
	Calculates HDR tone mapping
*/
float4
ToneMap(float4 vColor, float lumAvg, float4 luminance)
{
	// Calculate the luminance of the current pixel
	float fLumPixel = dot(vColor.rgb, luminance);
	// Apply the modified operator (Eq. 4)
	float fLumScaled = (fLumPixel * g_fMiddleGrey) / lumAvg;
	float fLumCompressed = (fLumScaled * (1 + (fLumScaled / (g_fMaxLuminance * g_fMaxLuminance)))) / (1 + fLumScaled);
	return float4(fLumCompressed * vColor.rgb, vColor.a);
}

We use a constant for the middle gray area and the maximum amount of luminance. This could be parametrized, but it’s not really necessary. We calculate the average luminance using the following kernel:

//------------------------------------------------------------------------------
/**
	Performs a 2x2 kernel downscale
*/
void
psMain(float4 Position : SV_POSITION0,
	float2 UV : TEXCOORD0,
	out float result : SV_TARGET0)
{
	float2 pixelSize = GetPixelSize(ColorSource);
	float fAvg = 0.0f;

	// source should be a 512x512 texture, so we sample the 8'th mip of the texture
	float sample1 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(0.5f, 0.5f) * pixelSize, 8), Luminance);
	float sample2 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(0.5f, -0.5f) * pixelSize, 8),  Luminance);
	float sample3 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(-0.5f, 0.5f) * pixelSize, 8), Luminance);
	float sample4 = dot(ColorSource.SampleLevel(DefaultSampler, UV + float2(-0.5f, -0.5f) * pixelSize, 8), Luminance);
	fAvg = (sample1+sample2+sample3+sample4) * 0.25f;

	float fAdaptedLum = PreviousLum.Sample(DefaultSampler, float2(0.5f, 0.5f));
	result = clamp(fAvg + (fAdaptedLum - fAvg) * ( 1 - pow( 0.98f, 30 * TimeDiff) ), 0.3, 1.0f);
}

The 0.98f can be adjusted to modify the speed of which the eye adaptation occurs. We can also adjust the factor with which we multiply the TimeDiff variable. Here we use 30, but we could use any value. Modifying either value will affect the speed of the adaptation.

Also, in order to make the pipeline more streamlined, we first downsample the color buffer (which has a screen-relative size) down to 512×512 before we perform the mipmap generation. This ensures us there will be a level with 2×2, and using simple math, we can calculate the mip level must be the 8th.

512×512 Level 0
256×256 Level 1
128×128 Level 2
64×64 Level 3
32×32 Level 4
16×16 Level 5
8×8 Level 6
4×4 Level 7
2×2 Level 8

So to conclude. First we perform a downscale from variable size down to 512×512. Then we perform a mipmap generation on the render target. Calculate the average luminance and use the time difference to blend between different levels of luminance. We then use the luminance value and perform the operator described above. In order for this to be properly handled, we blend both the bloom and the final result. If we perform bloom without the tonemapping, we will get an overabundance of bloom. The difference can be seen below:

Full tonemapping
Full tonemapping
Blur tonemapping disabled
Blur tonemapping disabled
Final tonemapping disabled
Final tonemapping disabled

// Gustav

Release soon.

The time has finally come to working on the public release of our Nebula3 version. In the true spirit of when Andre still worked at Radonlabs we will release our work on N3 with the same 2 clause BSD license. We were thinking of integrating some of the things we are working on first, but if you continue on that road you will never release anything, so we decided to go ahead and just release and keep working after that. There are still some things in the pipeline, as a fully working OpenGL4 port (using AnyFX that Gustav is working on), a fully working Havoc integration (mostly done) and a rewrite of the network layer.

Currently most of the work left is cleaning up random code, adding proper copyright/license stuff, revamping the build system a bit so that it is a bit more newcomer friendly and above all create some nice demos with content we have created here. Should be done by next week hopefully, so be ready!

And in other news, glad midsommar! ;D

AnyFX progress

Designing a programming language, even a simple one as AnyFX is hard work. It’s hard work because there are so many little things you miss during initial development and planning. Anyways, here is the progress with AnyFX so far. This video shows a shader implemented in AnyFX for OpenGL, and uses vertex, hull, domain, geometry and pixel shading. It uses hull and domain shaders to tessellated and displace the cube into a sphere. The geometry shader is used as a single-pass wireframe renderer, as described and implemented here: http://prideout.net/blog/?p=48. In the video you can see how I dynamically change tessellation inner and outer factors using an AnyFX variable.

anyfx

The next step on the list is compute shaders. When they work properly and I’m satisfied with how they are handled, I’m going to start integrating this into Nebula.

AnyFX, what the fuzz?

As a part of my studies, I’ve been developing a very simple programming language, very similar to that of Microsoft FX for effects. The difference between AnyFX and Microsoft FX is that AnyFX is generic, meaning it will work for any back-end implementation. The language works by supplying all the other stuff BESIDES the code which we need to render. This means that we actually put back-end specific implementations in the shader bodies. Why you may ask? Well, it may be extremely dangerous and poorly optimized if we are to define our own language for intrinsics, function calling conventions etc, and directly translate this to graphics assembler. Instead, we rely on the vendor-specific back-end compilers to do the heavy work for us. As such, we can super easily port our old HLSL/FX shaders or GLSL shaders by simply copying all of the functionality in the function bodies straight into an AnyFX file. However, this requires us to provide potentially several files in order to have support for different shader libraries, and yes, in this sense you are correct. We could implement a language with several function bodies, one for each implementation, but it wouldn’t look like C anymore, and the code could get messy in a hurry. Sounds confusing? Well, here’s an example:

//——————————————————————————
// demo.fx
// (C) 2013 Gustav Sterbrant
//——————————————————————————

// This is an example file to be used with the AnyFX parser and API.
profile = glsl4;

// A couple of example variable declarations
sampler2D DiffuseTexture;
sampler2D NormalTexture;

state OpaqueState;
state AlphaState
{
DepthEnabled = true;
BlendEnabled[0] = true;
SrcBlend[0] = One;
DstBlend[0] = One;
};

// a variable block containing a set of variables, this will instantiated only once in the effects system
// this block of variables will be shared by all other .fx files compiled during runtime with the same name and the [shared] qualifier
varblock Transforms
{
mat4 View;
mat4 Projection;
};

mat4 Model;

varblock Material
{
float SpecularIntensity = float(1.0f);
vec4 MaterialColor = vec4(1.0f, 0.0f, 0.0f, 1.0f);
};

//——————————————————————————
/**
Simple vertex shader which transforms basic geometry.

The function header here complies (and has to comply) with the AnyFX standard, although the function code is written in a specific target language.

This language is compliant with GLSL
*/
void
vsStatic(in vec3 position, in vec2 uv, out vec2 UV)
{
gl_Position = Projection * View * Model * vec4(position, 1.0f);
UV = uv;
}

//——————————————————————————
/**
Simple pixel shader which writes normals and diffuse colors.

Here, we use multiple render targeting using input/output attributes.

We also apply a function attribute which tells OpenGL to perform early depth testing
*/
[earlydepth]
void
psStatic([color0] out vec4 Color)
{
Color = texture(DiffuseTexture, uv);
}

//——————————————————————————
/**
Two programs, they share shaders but not render states, and also provide an API-available data field.
*/
program Solid [ string Mask = “Static”; ]
{
vs = vsStatic();
ps = psStatic();
state = OpaqueState;
};

program Alpha [ string Mask = “Alpha”; ]
{
vs = vsStatic();
ps = psStatic();
state = AlphaState;
};

 

So, what’s fancy here? Well first of all, we can define variables for several shader programs (yay!). The programs combines vertex shaders, pixel shaders, eventual hull-domain and geometry shaders, together with a render state. A render state defines everything required to prepare the graphics card for rendering, it includes depth-testing, blending, multisampling, alpha-to-coverage, stencil testing etc. Basically, for you DX folks out there, this is a combined Rasterizer, DepthStencil and BlendState into one simple object. You may notice that we write all the variable types with the GLSL type names. However, we could just as well do this using float1-4, matrix1-4×1-4, i.e. the HLSL style. The compiler will treat them equally. You may also notice the ‘profile = glsl4’ which just tells the compiler to generate GLSL code as the target. By generate code in this case, I mean the vertex input methodology (which is different between most implementations). It’s also used to transform the [earlydepth] qualifier to the appropriate GLSL counterpart. We can also define variable blocks, called ‘varblock’, which handles groups of variables as buffers. In OpenGL this is known as a Uniform Buffer Object, and in DirectX it’s a Constant Buffer. We also have fancy annotations, which allows us to insert meta-data straight into our objects of interest. We can for example insert strings telling what type of UI-handle we want for a specific variable, or a feature mask for our programs, etc. Since textures are very very special, in both GLSL and HLSL, we define a combined object, called sampler2D. We can also define samplers, which is handled by DirectX as shader code defined objects, and in OpenGL as CPU-side settings. In GLSL we don’t need to define sampling from a texture using both a texture and a sampler, but in HLSL4+ we do, so in that case, the generated code will quite simply put the sampler object in the code. We can also define qualifiers for variables, such as [color0] as you see in the pixel shader, which means that the output will be to the 0’th render target. AnyFX currently supports a plethora of qualifiers, but only one qualifier per input/output.

Anyways, to use this, we simply do this:

 

this->effect = AnyFX::EffectFactory::Instance()->CreateEffectFromFile(“compiled”);
this->opaqueProgram = this->effect->GetProgramByName(“Solid”);
this->alphaProgram = this->effect->GetProgramByName(“Alpha”);
this->viewVar = this->effect->GetVariableByName(“View”);
this->projVar = this->effect->GetVariableByName(“Projection”);
this->modelVar = this->effect->GetVariableByName(“Model”);
this->matVar = this->effect->GetVariableByName(“MaterialColor”);
this->specVar = this->effect->GetVariableByName(“SpecularIntensity”);
this->texVar = this->effect->GetVariableByName(“DiffuseTexture”);

 

Then:

 

// this marks the use of AnyFX, first we apply the program, which enables shaders and render states
this->opaqueProgram->Apply();

// then we update our variables, seeing as our variables are global in the API but local internally, we have to perform Apply first
this->viewVar->SetMatrix(&this->view[0][0]);
this->projVar->SetMatrix(&this->projection[0][0]);
this->modelVar->SetMatrix(&this->model[0][0]);
this->matVar->SetFloat4(color);
this->specVar->SetFloat(1.0f);
this->texVar->SetTexture(this->texture);

// finally, we tell AnyFX to commit all changes done to the variables
this->opaqueProgram->Commit();

 

Aaaand render. We have some restrictions however. First, we must run apply on our program before we are allowed to set the variables. This fits nicely into many game engines, since we first apply all of our shader settings, then apply our per-object variables, and lastly render. We also run the Commit command, which updates all variable buffers in a batched manner. This way, we don’t need to update the variable block for each variable, seeing as this might seriously stress the memory bandwidth. When all of this is said and done, we can perform the rendering. We need to perform Apply first, because each variable will have different binding points in the shaders. In OpenGL, each uniform have a location in a program, and since different programs may use any subset of all variables declared, the locations are likely to be different. In HLSL4+, we use constant buffers for everything. For HLSL4+, commit is vital since if we only use constant buffers, we need to, at some point, update them.

All in all, the language allows us to extend functionality to compile-time stuff. For OpenGL, we can perform compile-time linking by simply testing if our shaders will link together. We can also obfuscate the GLSL code, so that nobody can simply read the raw shader code and manipulate it to cheat. However, during startup, we still need to compile the actual shaders before we can perform any rendering. In the newer versions of OpenGL, we can pre-compile program binaries, and then later load them in the runtime. This could easily be implemented straight into AnyFX if needed, but I’d rather have the shaders compiled by my graphics card so that the vendor driver can perform its specific optimizations. Microsoft seems to be discontinuing FX (for some reason unknown), but the system is still really clever and useful.

And also, as you may or may not have figured out, this is the first step I will take to finish the OpenGL4 render module.

When I’m done with everything, and it’s integrated and proven to work using Nebula, I will write down a full spec of the language grammar, qualifiers and release it open source.