AnyFX, what the fuzz?

As a part of my studies, I’ve been developing a very simple programming language, very similar to that of Microsoft FX for effects. The difference between AnyFX and Microsoft FX is that AnyFX is generic, meaning it will work for any back-end implementation. The language works by supplying all the other stuff BESIDES the code which we need to render. This means that we actually put back-end specific implementations in the shader bodies. Why you may ask? Well, it may be extremely dangerous and poorly optimized if we are to define our own language for intrinsics, function calling conventions etc, and directly translate this to graphics assembler. Instead, we rely on the vendor-specific back-end compilers to do the heavy work for us. As such, we can super easily port our old HLSL/FX shaders or GLSL shaders by simply copying all of the functionality in the function bodies straight into an AnyFX file. However, this requires us to provide potentially several files in order to have support for different shader libraries, and yes, in this sense you are correct. We could implement a language with several function bodies, one for each implementation, but it wouldn’t look like C anymore, and the code could get messy in a hurry. Sounds confusing? Well, here’s an example:

//——————————————————————————
// demo.fx
// (C) 2013 Gustav Sterbrant
//——————————————————————————

// This is an example file to be used with the AnyFX parser and API.
profile = glsl4;

// A couple of example variable declarations
sampler2D DiffuseTexture;
sampler2D NormalTexture;

state OpaqueState;
state AlphaState
{
DepthEnabled = true;
BlendEnabled[0] = true;
SrcBlend[0] = One;
DstBlend[0] = One;
};

// a variable block containing a set of variables, this will instantiated only once in the effects system
// this block of variables will be shared by all other .fx files compiled during runtime with the same name and the [shared] qualifier
varblock Transforms
{
mat4 View;
mat4 Projection;
};

mat4 Model;

varblock Material
{
float SpecularIntensity = float(1.0f);
vec4 MaterialColor = vec4(1.0f, 0.0f, 0.0f, 1.0f);
};

//——————————————————————————
/**
Simple vertex shader which transforms basic geometry.

The function header here complies (and has to comply) with the AnyFX standard, although the function code is written in a specific target language.

This language is compliant with GLSL
*/
void
vsStatic(in vec3 position, in vec2 uv, out vec2 UV)
{
gl_Position = Projection * View * Model * vec4(position, 1.0f);
UV = uv;
}

//——————————————————————————
/**
Simple pixel shader which writes normals and diffuse colors.

Here, we use multiple render targeting using input/output attributes.

We also apply a function attribute which tells OpenGL to perform early depth testing
*/
[earlydepth]
void
psStatic([color0] out vec4 Color)
{
Color = texture(DiffuseTexture, uv);
}

//——————————————————————————
/**
Two programs, they share shaders but not render states, and also provide an API-available data field.
*/
program Solid [ string Mask = “Static”; ]
{
vs = vsStatic();
ps = psStatic();
state = OpaqueState;
};

program Alpha [ string Mask = “Alpha”; ]
{
vs = vsStatic();
ps = psStatic();
state = AlphaState;
};

 

So, what’s fancy here? Well first of all, we can define variables for several shader programs (yay!). The programs combines vertex shaders, pixel shaders, eventual hull-domain and geometry shaders, together with a render state. A render state defines everything required to prepare the graphics card for rendering, it includes depth-testing, blending, multisampling, alpha-to-coverage, stencil testing etc. Basically, for you DX folks out there, this is a combined Rasterizer, DepthStencil and BlendState into one simple object. You may notice that we write all the variable types with the GLSL type names. However, we could just as well do this using float1-4, matrix1-4×1-4, i.e. the HLSL style. The compiler will treat them equally. You may also notice the ‘profile = glsl4’ which just tells the compiler to generate GLSL code as the target. By generate code in this case, I mean the vertex input methodology (which is different between most implementations). It’s also used to transform the [earlydepth] qualifier to the appropriate GLSL counterpart. We can also define variable blocks, called ‘varblock’, which handles groups of variables as buffers. In OpenGL this is known as a Uniform Buffer Object, and in DirectX it’s a Constant Buffer. We also have fancy annotations, which allows us to insert meta-data straight into our objects of interest. We can for example insert strings telling what type of UI-handle we want for a specific variable, or a feature mask for our programs, etc. Since textures are very very special, in both GLSL and HLSL, we define a combined object, called sampler2D. We can also define samplers, which is handled by DirectX as shader code defined objects, and in OpenGL as CPU-side settings. In GLSL we don’t need to define sampling from a texture using both a texture and a sampler, but in HLSL4+ we do, so in that case, the generated code will quite simply put the sampler object in the code. We can also define qualifiers for variables, such as [color0] as you see in the pixel shader, which means that the output will be to the 0’th render target. AnyFX currently supports a plethora of qualifiers, but only one qualifier per input/output.

Anyways, to use this, we simply do this:

 

this->effect = AnyFX::EffectFactory::Instance()->CreateEffectFromFile(“compiled”);
this->opaqueProgram = this->effect->GetProgramByName(“Solid”);
this->alphaProgram = this->effect->GetProgramByName(“Alpha”);
this->viewVar = this->effect->GetVariableByName(“View”);
this->projVar = this->effect->GetVariableByName(“Projection”);
this->modelVar = this->effect->GetVariableByName(“Model”);
this->matVar = this->effect->GetVariableByName(“MaterialColor”);
this->specVar = this->effect->GetVariableByName(“SpecularIntensity”);
this->texVar = this->effect->GetVariableByName(“DiffuseTexture”);

 

Then:

 

// this marks the use of AnyFX, first we apply the program, which enables shaders and render states
this->opaqueProgram->Apply();

// then we update our variables, seeing as our variables are global in the API but local internally, we have to perform Apply first
this->viewVar->SetMatrix(&this->view[0][0]);
this->projVar->SetMatrix(&this->projection[0][0]);
this->modelVar->SetMatrix(&this->model[0][0]);
this->matVar->SetFloat4(color);
this->specVar->SetFloat(1.0f);
this->texVar->SetTexture(this->texture);

// finally, we tell AnyFX to commit all changes done to the variables
this->opaqueProgram->Commit();

 

Aaaand render. We have some restrictions however. First, we must run apply on our program before we are allowed to set the variables. This fits nicely into many game engines, since we first apply all of our shader settings, then apply our per-object variables, and lastly render. We also run the Commit command, which updates all variable buffers in a batched manner. This way, we don’t need to update the variable block for each variable, seeing as this might seriously stress the memory bandwidth. When all of this is said and done, we can perform the rendering. We need to perform Apply first, because each variable will have different binding points in the shaders. In OpenGL, each uniform have a location in a program, and since different programs may use any subset of all variables declared, the locations are likely to be different. In HLSL4+, we use constant buffers for everything. For HLSL4+, commit is vital since if we only use constant buffers, we need to, at some point, update them.

All in all, the language allows us to extend functionality to compile-time stuff. For OpenGL, we can perform compile-time linking by simply testing if our shaders will link together. We can also obfuscate the GLSL code, so that nobody can simply read the raw shader code and manipulate it to cheat. However, during startup, we still need to compile the actual shaders before we can perform any rendering. In the newer versions of OpenGL, we can pre-compile program binaries, and then later load them in the runtime. This could easily be implemented straight into AnyFX if needed, but I’d rather have the shaders compiled by my graphics card so that the vendor driver can perform its specific optimizations. Microsoft seems to be discontinuing FX (for some reason unknown), but the system is still really clever and useful.

And also, as you may or may not have figured out, this is the first step I will take to finish the OpenGL4 render module.

When I’m done with everything, and it’s integrated and proven to work using Nebula, I will write down a full spec of the language grammar, qualifiers and release it open source.

Summer time!

So the game projects are over, phew! The critique we got this year was far less than it was before, which is good since we spent a good lot of time working on the tools. Of course we had some bugs during the development phase, and had some problems which we want to address. Apart from that, it’s time to look forward again to see what we want to change and improve. My area of interest, as you may know by now, is rendering and shading, so this is the list of things I want to fix at the moment:

  • Materials currently need to be inserted in the materials list, and also EVERYWHERE when we want the material to be rendered. It would be much neater if the material could have some attribute for which we sort and render, so that we may say “Render all materials which requires lighting here”, “Render all materials which requires subsurface scattering here” etc etc. This makes it super simple to add a new material into Nebula.
  • Rendering is threaded, but seems to be somewhat glitched since we get random deadlocks. It’s also very cumbersome to get animation stuff, attachments, shader variables, skeleton stuff etc since we have to perform a thread sync every time we request something. So, I would like to rethink the threading part, so that we may use the power of multithreaded rendering, while at the same time maintain simple access to graphics data. My current idea is to implement a separate context which allocates resources (since texture, mesh and shader allocation is going to take time), while still maintaining the immediate render context on the main thread. Animation and visibility queries should still be in jobs, since it works extremely smoothly and fast. This also simplifies Qt applications since the WinProc will be in the main thread! Also, if possible, we could have another thread which only performs draw calls, so that the actual drawing doesn’t directly affect game performance. This last part is similar to: http://flohofwoe.blogspot.se/2012/12/coregraphics2.html
  • Changing materials on models is quite a hustle, and it shouldn’t since the material is only used to batch render objects, so we should be able to switch materials, which is then applied on the next render.
  • All models should be flat in hierarchy. This is because we want a model to be split by it’s materials, meaning we can easily find and set a variable for a ‘node’ in a model. Previously, we had to traverse the node hierarchy for the model to find a node, but this shouldn’t have to be the case since we can flatten transforms and pre-multiply all transforms directly to the mesh. As such, we can remove a lot of complexity with model updates by simply having a combined transform for all nodes, let all nodes be primitive groups in a mesh and let each node have its own material. This still allows us to have the super clever models used in Nebula, where we can have several materials for the same model.
  • Dynamic meshes. If we are going to have a deferred rendering context which allocates resources, then we should be able to load static meshes in the thread, and create dynamic meshes in our main thread, then we can also modify the dynamic meshes however we want. As such, we could super simply implement cloth, blend shapes, destruction etc.
  • We should be able to import statically animated meshes. A very clever way to do this is to use all the animated transform nodes and convert them to a joint hierarchy, and then to simply rigid bind the meshes to their joints. This is very similar to how rigid binding is done at the moment, but in this case the artists need not create a skeleton before hand (creds for this goes to Samuel Lundsten, our casual graphics artist).
  • Model entities are a bit too generic. This is very flexible with the current multithreaded solution, since we can simply send a GraphicsEntityMessage to the server side ModelEntity and everything works out fine. However, whenever we need a specialized model, for example a particle effect, a character (, and in the future cloth/destructable model) we are likely to have specialized functionality, for example particle effect start/stop, character animaton play and so on, you get the idea. It’s a bit more intuitive if we have different types of entities for this instead of using generic model entities. Also, in my opinion, the way a character is ALWAYS a part of a model entity, be it initialized or not, which doesn’t look that nice. Another problem is that since particle models, just like any models, have a simple model hierarchy, we need to traverse all particle nodes in order to start/stop the particle effect. This can then be way more generalized if we have a ParticleModelEntity which handles this for us.
  • OpenGL renderer. I don’t really like the DirectX API, in any shape or form. It’s only valid to use for Windows and Xbox, and for Windows we can still use OpenGL anyways, so there is really no reason unless you want to develop games exclusively for Xbox. Besides, the current gen graphics is only available on Xbox One, so the only reason we would want to keep using DirectX would be to develop games to a set-top box. We will of course till have the DirectX 11 renderer available if we want. Right, OpenGL! I’m almost done with AnyFX right now, and currently the only back-end implementation available is for GL4 (coincidence? I think not!). The shading was the major cog in our wheels, since we had no FX counterpart for OpenGL, which I hope AnyFX will supply.
  • Integrate AnyFX. Whenever I’m done with AnyFX, meaning it should be thoroughly tested with geometry shading, hull and domain shading, samplers and perhaps even compute shaders, I’m going to integrate it with Nebula in order to accomplish the goal defined above.
  • Compute shading. Whoa, this would be soooo awesome to integrate. With this, we could have GPGPU particles, or perhaps a Forward+ renderer (drool).
  • Frame shader improvement. It could prove useful to be able to declare depth-stencil buffers separately. This isn’t really THAT important, but it could prove useful to render some stuff to a separate depth buffer without having requiring it to be paired with the actual render targets.
  • Implement some system which lets us use two sources for shaders. This is useful when having an SDK, with all the useful shaders from the engine, and then be able to import new shaders from an auxiliary source, your project for example. The same goes for materials, and materials should perhaps be defined as a resource where each material defines their shaders, variations for each shader and parameters for each material. As such, we can simply just add a new material. Currently, the materials are listed in one big XML file, and this is not so neat, seeing as we might have tons of different materials, and finding a certain material takes a considerable amount of time since just the current materials take up tons of space. With this implemented, we can also split materials into SDK materials and project materials. As for frame shaders, we specify a very specific render path, and in my opinion, this should be exclusive for each project.
  • Enabling and disabling rendering effects is neither pretty, nor working properly. The frame shaders define a complete render path, and each frame shader has mutually exclusive set of resources. So if we want to implement some AO algorithm, we have to put it straight in the main frame shader, and this is both ugly and perhaps unwanted (when considering different specs). So, if we want to have for example low-medium-high AO quality, we have to define three different frame shaders, each with their own variations. This is perhaps not so nice. While I love the frame shader render path system, I dislike the inflexibility to control it during runtime. A way to handle this would be to define where during the render path we want to perform some render algorithm by supplying the render path with an algorithm handler, for example:

    <Algorithm handler=”Lighting::AOServer” output=”AOBuffer”>
    <Texture name=”Depth” value=”DepthBuffer”/>
    </Algorithm>

    The render system would then call the Lighting::AOServer::Instance() with a set of predefined functions, which is used to prepare the rendering of the algorithm, which is then written to the output. As such, we can simply call the AOServer and adjust our AO settings accordingly. We could also have batches within the Algorithm tag, which enables us to render geometry with a specific adjustable server handling it. The Texture tag defines an input to the handler, which works by supplying a symbolic name (“Depth” in this case) to which we bind a texture resource (Here it’s the render target “DepthBuffer”). Since the frame shader only deals with textures, this is the only thing we have to define in order to run our algorithm.

  • Visibility is resolved a little goofy currently. This is because visibility has to be resolved after we perform OnRenderBefore() on our models, which in turn is called whenever we prepare for culling. This is a bit weird, since OnRenderBefore is REQUIRED in order for a model to use a transform, but OnRenderBefore only gets called if an object is visible. This means that when an object gets created, it’s automatically in origo until OnRenderBefore gets called. However, since this only gets called when the camera sees an object, it means that our shadow casters never gets this feedback, and as such, they stay untransformed until they are visible by the camera! Have in mind that the global bounding box gets updated constantly, we don’t have to turn the camera to origo just to get objects to be visible, however we have to see them with the actual camera! Albeit, this is very clever since it means objects outside the visible area never gets their transforms updated, and this is completely legit if we don’t use shadows. However, when we do have shadows, we need to update models for all objects visible by the camera and all shadow casting lights! The current solution is a bit hacky; it works but it can be done much nicer.
  • Point light shadows. As of right now, point lights cannot properly cast shadows. I should implement a cube map shadow rendered which utilizes geometry shading to render to a cube with one draw pass. This also means that point light shadows cannot be baked into the big local light shadow buffer, but need instead their own buffers.

Phew, quite a list! In the next update I’ll probably be posting something about more about AnyFX, since it will be the first thing I check off the list.

 

// Gustav

Water 2.0

I got a request some time ago to make a video showing the water. So I did!

http://vimeo.com/66406007

I also took the time to make some additions which we got as requests from the students. The first was to be able to have depth fog, so whenever something is really far away from the water plane, it gets fogged, so as to simulate really deep water.

I also added, per request, a foam effect. The foam works just like the fog, based on a distance and the per-pixel difference in depth, apply either the foam texture or the fog color. The foam is showed in the video as the white outline around the house, which helps with getting a more realistic interaction between the water and other solid objects.

You can also see that the objects beneath the water become more opaque depending on how close to the water surface they are. This also gives it a more realistic effect, since it looks like the water thins out and thus obscures the underlying object less until it hits the water surface.

Currently, the water has two color settings, one for the water color and one for the fog color. The fog color applies to any geometry beneath the water distance based on a distance factor. This is also visible in the video when you see the big island patch being submerged more and more, it turns purple because of the fog color purple, while the water surface however, remains blue.

Oh, and you can also see the fake reflections. I should really implement a way to get real reflections instead of having that screen-space cheat…

 

Current status

Since the game projects started, we haven’t quite had the time to work on the tech or to post updates here, but let me instead give you a brief overview of what’s going on.

Obviously, we’ve been finding bugs and fixing them, nothing to fancy or weird, except for one special thing. It so happens, on some computers, that we get thread deadlocks. We have yet to understand why this happens, since Nebula doesn’t have cross-depending threads in any way. This seems to happen whenever we start to animate characters, however, it doesn’t happen if we just animate characters. Here is where the fun comes in. I’ve setup a test environment with a computer identical to the ones getting this problem. I tested this by just creating a render application which only rendered characters. I thought it might be some kind of performance problem, so I just put tons of characters in the scene and pushed it down to 2-8 fps. It never happened. There seem to be something else amiss here, and even though we’ve found a solution, we’re still uncertain how good it is.

The solution was to, quite simply, remove the render thread. This doesn’t mean we remove the rendering, but just put the render system on the main thread. We hesitate to do this since there must be a reason for the render thread in the first place, be it for the PS3, or just to handle multi-core PCs better. One theory we have to why this is a problem might be that the window handle and the window event loop is actually on another thread than the actual application, something which caused random deadlocks before. However, in the case with the previous WinProc problem, the thread actually hanged inside the procedure instead of somewhere else.

This got me thinking, maybe it is time to redesign the render system, not because it’s wrong or poorly implemented, but because we could benefit from another approach to render with multiple threads, instead of having a render system back end which is enormous and complex. Instead, we could make good use of modern graphics hardware to implement several render devices, and move the parallelism to only perform draw calls. Note that this will not be possible to utilize in older hardware, but seeing as we are 100% focused on the PC platform, this shouldn’t be a problem for us. This is similar to what Andre Weissflog has already implemented in the official Bigpoint branch of Nebula.

We might also want to rethink how models work. The system today with model hierarchies is really nice, and provides a very accurate 1:1 representation of the Maya content into Nebula. The only problem is its performance. If we have a complex model with a huge hierarchy, the update time per model can be murder for the cache, since it has to recursively traverse each model node and perform updates. Instead, it would be much neater to just have a list of model nodes, which can be processed in a much more data-driven manner. Having said that, the majority of the render system works nicely, is easily maintainable, and should be easy to implement into this new design idea.

We’ve also been considering redesigning application to have it be slightly less flexible but more data-driven in design. We’re a bit suspicious that game applications run so much slower than render applications. Obviously this would be the case since we have lots of game mechanics to compute and handle, but the application system doesn’t quite scale well with lots of entities. Both redesign considerations are of course projects for the future, and will not be started with in the near future.

What I am currently working on, instead of fixing and finding bugs in the tools our game groups and internship guys are using, is a way to tie Nebula, and any other game engine for that manner, to make effects for shaders. Much like the Microsoft Effects system, effects provide a really clean and flexible way to create shader pipelines by combining shader programs, and render states into a manageable construct which can be written in a file, compiled, read in your game engine and then used with a very simple interface. The only problem is that up to this point in time, nothing with the same level of functionality has yet been provided for OpenGL.

I give to you AnyFX. Well, I don’t really, because it’s not done yet, but I can at least give you the concept.

AnyFX strives to implement a language which looks like C, and thus also resembles HLSL and GLSL, but also handles shader linkage, render states, and variable handling. At the same time, I also wanted a simple way of porting shaders to use with AnyFX, as well as maintain the same level of optimization and language-specific intrinsics without having to write an extremely complex compiler, since this is way above my level of knowledge and not within a reasonable time frame. AnyFX works by writing all wrapping code in the AnyFX language standard, although all code handled inside the function scope is target language native. To avoid further confusion, variables can be declared as either GLSL or HLSL types, although identical data types will internally be represented the same. AnyFX comprises of two parts, a compiler, and an API. An AnyFX file has to be compiled into a binary blob, during which it gets evaluated for cross shader linkage, semantic and static analysis. The compiler then outputs target language, which only covers variable declarations and function headers, nothing more, nothing less.

The API can load a binary compiled AnyFX file, at which point it loads all structures, such as render states, sampler states and programs. The back end is fed this information, and is implemented depending on the target platform. So for example, if I build an AnyFX file with profile glsl4, the back end will implement an GLSL4Effect, or if I build an AnyFX file with the hlsl5 profile, I get an HLSL5Effect. The front end can simply call Apply to apply the effect, meaning it will set the render state, and Commit, which basically just updates constant variable buffers if present. Just like it works currently in Nebula, we can simply integrate AnyFX to replace both Effects11, but more importantly, have an actual effects interface for OpenGL.

This time you get no videos or pictures I’m afraid, because to be honest, there is nothing new and cool to show off.

Water

Nebula has been missing something for a very long time. When I started working with Nebula something like 3 years ago, we had simple UV-animated alpha planes which were supposed to look like water. And for the time being, they looked really good.

However, today a simple uv-animated alpha plane won’t quite cut it. Instead, we need something fancier, incorporating refraction, reflection and specular highlighting. I have been doing exactly this for the past day. The result is beautiful and realistic-ish water (let’s face it, water in real life is rather boring). Picture time!

Refraction

Refraction

Reflection and specularity

Reflection and specularity

 

However, I’ve been a bit lazy with the reflections in our implementation. The GPU Gems article from Nvidia shows that we should render the scene from below the water plane in order to get correct reflections. Instead, I simply just cheat and use the already lit and rendered image as a reflection. This makes the reflections completely wrong, but it still looks good…

Sometime in the future, I might write a new frame shader which cheaply renders the objects being reflected without all the fancy stuff like SSS, HBAO and multiple light shading, so as to give decent-looking geometry for reflections. Although, for the time being, this serves nicely. I also have an alternative shader which uses an environment map to render the reflections using a pre-rendered environment map, which may look good when using water in small local areas where real-time reflections are easily overlooked.

The water is fully customizable, with reflection intensity, water color, deep water color and of course reflection map. The reflection map is supposed  to be a pre-rendered cube map of the scene, so that reflections can be done without rendering everything twice. One can select whether to cheat, or to use an environment map.

I’ve also written a billboard rendering system, which basically just lets us render textures to billboards which is very useful for the level editor and other such tools. This is a crude representation of how it can look:

Lights represented as billboards

Lights represented as billboards

With actual icons, we can neatly show billboards instead of geometry to represent such things as spotlights, pointlights, the global light, and any other such entities which can’t, or shouldn’t, be represented with geometry.

Next thing on the list is terrain rendering, so keep tuned!

 

// Gustav

Foliage

Apart from fixing a couple of small problems, I’ve also implemented a fancy shader which lets us perform vertex displacement. This means we can get fancy-looking things as trees and grass. It also works with shadows, picking and of course the god ray thing. It’s really that big of a deal, every engine has it, but the point is that now we do too! Pretty picture incoming:

 

trees

 

What you can also do of course, is to apply this shader on basically anything static! As such, we can get funny looking effects like this:

 

flubberface

 

Which is of course terrifying.

We’ve found a funny bug which seems to relate to Windows/Qt/DirectX11. If we run the render device without any vertical synchronization applied, we get weird internal locks in Qt when we do anything. This seems to be because we run Qt on the main thread while the rendering, which in turn updates the window, runs in it’s own thread. The application doesn’t crash or anything, it’s just QRasterWindowSurface::flush() that freezes when trying to perform BitBlt. If we however enable vertical synchronization, Qt doesn’t lock and everything works nicely.

Crepuscular rays

New week, new assignments. This time, I’ve implemented crepuscular rays, or as the hipsters would say, ‘godrays’. As a side effect, I’ve also taken to implement a sun mechanism, which let’s us render a big color-radiating sphere. This allows us to create pretty outdoor environmental effects. We also lacked a proper sun, so the sun had to be a part of the skydome. This of course meant we couldn’t re-position the sun, or have fancy day-night cycles. Here’s a picture of the result:

godrays

The sun uses the color of the global light, and it’s position is also quite simply just the position where the global light should be. The only problem is that the global lights in Nebula has no position, but only a direction, so I had to device a method which always kept the sun away from the camera, but still far enough so that it would be visible. The way I solved this was by just taking the camera position, and then quite simply moved the light in the opposite of the global light direction times a scale factor which would denote how far away the sun would be. I also had to make it so that the sun was oriented towards the camera as well, but that’s trivial.

That’s it for me!

 

// Gustav

One and one makes two!

So I’ve been recovering from the UI, fixing stuff like scissor rectangles and streamlining the setting of viewports and render targets. Since we need the render code to be as fast as humanly possible, I’ve been fiddling a bit trying to eliminate unnecessary state switches. Anyways…

We’ve had a small problem with using the physics for picking in the level editor. So I thought, hey! Why not adding per-pixel picking by rendering IDs for each entity to a buffer. This gives us a very nice projected representation of our scene, so we could just pick the pixel at the mouse position and voila, we have our entity! The only thing is that the render system isn’t and shouldn’t be dependent on the application system. So we need to be able to pick a game entity by clicking in screen-space, but we only actually have access to graphics stuff when rendering. What I did was to add a picking integer to the model entity, which is simply a generic identifier which we will use for picking. Instead of directly associating the game entity with the model entity, we instead use a generic identifier.

Note that we may want to use screen-space picking for other things than just the level editor, such as RTS selection, shooting in FPS or whatever. The only downside is that we must render everything again. We could just add this to our deferred rendering path, but we may also want to disable this feature to improve performance for games which doesn’t need pixel-perfect picking.

You may want to know what the title is all about. Well, I’ve also made it so that the content browser and level editor can communicate with each other. This means we can be in the content browser and edit stuff, save it and it will be immediately updated in the level editor. This is beautiful for many purposes, because an artist can sit in the content browser, fiddle around with their assets, and see how it would look in the level editor on the fly.

I’m trying to think of some image to put here just to show you how this works, but this week is highly functional, but not so much graphics related.

Ok, well, maybe that’s not entirely true…

While also implementing per-pixel picking, I’ve also done some work regarding the limitation of window size. I’ve made it so that we can re-size the entire Nebula context, and make every render target and render plugin get updated with the new screen size. This means we can have a variable-size window and render space, without losing quality or getting weird-looking aliasing artifacts. It’s also important for the render modules to get information about screen re-sizing, the UI for example needs the new dimension of the screen to properly scale and map mouse positions. The only downside to the re-sizing is that it of course costs a lot because every single render target needs to be destroyed and recreated.

An alternative to re-sizing the render targets could be to always render to a big render target, and then simply change the camera matrix to just view a portion of the entire frame. This might be an alternative, but since we couldn’t change the resolution in real-time before, I figured we might want this feature anyways.

I also realized I made a slight boo-boo with the bloom post effect, so that has been resolved and looks much better (like Vaseline on glass).

If that wasn’t enough, I also took the time to restore the old FX system in Nebula. It worked, sorta, the only problem it had was that it didn’t really respect the render-application thread border in any way. So I thought I could fix it so that it wouldn’t randomly crash. So now we can spawn fire’n’forget effects like camera shakes, camera animation tracks and time-limited graphics entities. Sadly, I’m too lazy to show you any pictures…

 

// Gustav

UI

This might sound like the most boring title of a post ever created in the history of the human species. Fortunately, I’m not only going to talk about UI, but I will start off by telling you that I’ve spend the last two days integrating libRocket (http://librocket.com/) into Nebula. Sweet stuff, really simple and easy-to-use library for creating good-looking dynamical UIs. That’s the UI part. Now comes the rage.

I can’t for the sake of my sanity understand what went through the heads of the developers of DirectX 10 and 11 when they thought it would be a good idea to have to manually link a specific vertex layout with a specific shader. Granted the same ‘signature’ can be reused in many other places, it’s still a retarded way of solving the problem. Now you may ask, what does this have to do with UI? Well, let me paint you the whole picture…

In Nebula, we load meshes dynamically (as I imagine every engine does it), which means we don’t always know what vertex components lie in the mesh. Additionally, we can use a mesh to be rendered with any given shader (as long as we’re not attempting something like skinning a mesh which has no blend weights). What this means is that we need for a mesh, a vertex declaration (which is some internal object that keeps track of inputs to the vertex shader). The vertex declaration is used by taking the bytecode of the compiled shader (intuitive!) and pairing it with the vertex declaration structure. The following code might give you an idea…

 

ID3D11Device* device = D3D11RenderDevice::Instance()->GetDirect3DDevice();
HRESULT hr;

// gets the active shader, will only succeed if there is no pre-effect (that is, will only work for post-effects)
Ptr<CoreGraphics::ShaderInstance> instance = ShaderServer::Instance()->GetActiveShaderInstance();
Ptr<CoreGraphics::ShaderVariation> variation = instance->GetActiveVariation();
ID3DX11EffectTechnique* technique = variation->GetD3D11Techinque();
ID3DX11EffectPass* pass = technique->GetPassByIndex(0);

D3DX11_PASS_DESC desc;
pass->GetDesc(&desc);

// this must succeed!
hr = device->CreateInputLayout(decl, compIndex, desc.pIAInputSignature, desc.IAInputSignatureSize, &this->d3d11VertexDeclaration);

 

In return we get a D3D11InputLayout, which we need when setting up the Input Assembler before we render. Doesn’t seem too bad does it? Beautiful. It literally means that I, for each combination of a mesh and shader, must create a new vertex declaration if there exists none, store it somewhere so that I can find it again (which is far from trivial), and then fetch it when I need it. The idea as far as I’ve understood it, which amazingly is also supported by a huge group of developers out there is that this allows for a pre-render initialization of the input declaration, which is then supposed to easen the vertex-buffer-to-vertex-shader linkage. Well, OK, fine. I can buy the argument that we can achieve a high level of optimization if we do this, maybe. Although I seriously doubt the bottleneck in earlier versions of DirectX (like 9 for example) was that the linkage between a linear VBO and a static vertex program was too slow. I can of course be wrong.

It shames me how I’ve solved this. I’m going to show you anyways however, just to give you a hernia.

 

this->d3d11DeviceContext->IASetInputLayout(vl->GetD3D11VertexDeclaration());

 

Gets called from when we set a vertex layout in the RenderDevice. Good enough, but watch this…

 

ID3D11InputLayout*
D3D11VertexLayout::GetD3D11VertexDeclaration()
{
// ugly, but hey, we need it!
CreateVertexLayout();
n_assert(0 != this->d3d11VertexDeclaration);
return this->d3d11VertexDeclaration;
}

 

The CreateVertexLayout function was the first one in the list. It also has a safeguard that prevents it from running if the Nebula vertex layout object has a DirectX 11 vertex declaration setup. But this literally means that we ‘test’ if there is a vertex layout before rendering. EACH AND EVERY TIME. Since a vertex layout has NOTHING to do with the shader in any other aspect than input to the vertex shader, this feels like the most forced limitation and cumbersome solution I’ve come across. It also means that I have to make sure the correct shader is set in the shader server, just so that vertex declarations will always correctly match. Remember that this is a getter-function which actually changes an internal variable. Ugh…

But wait, there is more! One would maybe perhaps think that DirectX 11 would tell you something if you try to, in the Input Assembler, combine some random vertex declaration with an incompatible shader. The only thing you will get is that the matching ‘segment’ of the vertex declaration works, the rest is just random data. This is how we get back to the UI part again. You see, I had a shader for the UI which should take three components. Here is the shader:

 

//——————————————————————————
/**
*/
void
vsMain(float4 position : POSITION,
float2 uv : TEXCOORD,
float4 color : COLOR,
out float4 Position : SV_POSITION,
out float2 UV : TEXCOORD,
out float4 Color : COLOR)
{
Position = mul(position, ModelViewProjection);
Color = color;
UV = uv;
}

//——————————————————————————
/**
*/
void
psMain(in float4 Position : SV_POSITION,
in float2 UV : TEXCOORD,
in float4 Color : COLOR,
out float4 FinalColor : SV_TARGET0)
{
FinalColor = Texture.Sample(TextureSampler, UV) * Color;
}

 

Simple. Well, what happened was that I forgot to set the global active shader when rendering the UI. This meant that my vertex layout for my UI didn’t get to link to the actual ui shader. Instead it defaulted back to the static shader (a default which I’ve implemented to avoid crashes when moving from a skinned to a static mesh), which in turn meant that my color component was cut off. No warning, no nothing, just a silent error, and several hours of searching and pulling my hair just to find out that the underlying system is <insert another lovely word here>. This method could be used if one only had a hard-coded set of shaders, and always knew which shaders should be applied on each mesh, but for a bigger scale application with a content pipeline, this just hurts. Physically.

Anyways, here is a picture of the result:

rocket

We should really consider moving on to OpenGL… The only problem is that the FX system is amazingly handy.

 

// Gustav

Epidermis

I got a tip that skins are a very important thing to render properly. THE method to solve rendering through semi-solid objects such as skin, leaves etc. is called subsurface scattering, which can be seen here http://en.wikipedia.org/wiki/Subsurface_scattering.

The only problem with the general and near-perfect method of rendering skins using subsurface scattering is that one needs to perform light-operations per each light source. This means deferred rendering is out the window, which is bad! So, there is method, called SSSS or Screen-Space Subsurface Scattering, which has seen use in engines such as in the Unreal 3 engine.

Our graphics artist, Samuel, thought it would be a very nice addition to Nebula if I were to add this algorithm in order to render skins properly. Here is the result:

facenosss

No Screen-Space Subsurface Scattering

facesss

With Screen-Space Subsurface Scattering

I accomplished this by adding a new MRT, which renders Absorption color, Scatter color and binary mask which masks out the area where the SSS should take place. We need to mask out the rest because we are working with this in screen-space, and must therefore be very careful exactly what we apply our algorithm too, so that ordinary static objects don’t get this effect.

The algorithm itself is simply a horizontal and vertical bloom, where the light is blurred instead of the color depending on the depth of nearby pixels. This, combined with a Gaussian distribution which ‘favors’ reddish hues gives the effect that skin appear more life-like since it simulates light being spread under the surface of the skin.

The exact implementation in Nebula uses two passes. One pass is the standard boring old skinning pass, which calculates lighting, albedo, emissive and specularity. Then, we render all geometry which uses the SSS process, and while doing so we render out the absorption map and scatter map. Also, we render a small data buffer, which holds our variables, such as SSS width, SSS strength, and the SSS correction factor, as well as a bit which tells us if a certain pixel should be SSS:ed or not. Then we apply our screen-space post effect which runs the actual algorithm, performing the above seen image. This means that we can perform per-object settings for the SSS, while still rendering the result in screen-space. I also fixed a minor glitch with the original implementation, which gave artifacts when the edges of the screen cut a subsurfaced area by using the mirror address mode for the sampler, which means that pixels along the border of the screen wont suffer from wrapping samples, which in turn removes the artifact.

The red rectangle shows an artifact which occurs when using a wrap address mode

The red rectangle shows an artifact which occurs when using a wrap address mode

I also applied the same technique to the HBAO shader, seeing as it previously suffered from the same problem with artifacts along the screen borders.

// Gustav