I was wrong

I just wanted to say that I was wrong when it came to the hull shader with the inputs and outputs that has to match. Well, it turns out that I shot myself in the foot. The reason why it didn’t work is because all variables that should be shared globally, and with that I mean things like the orientation, projection and view-matrices, were stored in a cbuffer called Globals. Well, if you then declare another variable that is not in a cbuffer, the shader compiler automatically generates a cbuffer for you, storing all those rogue variables. This cbuffer is called $Globals. Nebula currently just handles one single cbuffer, because variables are treated as if they were to be updated once per frame. If you were to design a program like Nody, where every shader should have a variable that is supposed to be interchangeable from the CPU to the GPU, then you have to put the variable in some sort of global scope, which then becomes the $Globals cbuffer. Also, seeing as the constant buffers are constantly changed due to the fact that each object needs its own set of variables, there is no real reason to bundle them together.

Yes, one might argue that variables which are not used in one shader takes up unnecessary space, and while that is true, there is complete control to design a shader which only contains the variables required. This basically takes that optimization out of my hands and place it into the hands of anyone using the program for their game engine.

But I wanted to say this to keep the record straight. I was wrong, there is no undocumented internal voodoo required to make the hull shader work, it was my mistake. Seeing this from the bright side of course means that making very complex hull and domain shaders are now even easier than before, and as a result of this, I thought I’d recreate the trianglesubdivision hull shader to scale the amount of tessellation based on distance to the camera. This literally means that the detail in the object is dynamically scaled when approaching or retreating. It would also be really nice to try tessellating a surface using a real height map, instead of using a diffuse map as I am now. But I’m probably going to focus on other stuff for a while.

Deferred lighting and tessellation

If you haven’t noticed yet, I’ve been trying to not only make hull and domain shaders work, but also make them work in a practical sense, by which I mean having relevant nodes in Nody that allows a user to create a shader that uses tessellation without any hassle. This of course has to be interlaced with the deferred lighting, where the normals and depth has to be written using a tessellated surface. Sounds easy enough, but consider the fact that you need to divide your calculations for the position of a vertex and the view space position. Why? Well if you want to perform displacement, then it’s going to be rather hard if the vertex is already in projected into the scene. Instead, you want it to be in model-space (rotated by the object rotation), seeing as the heightmap is mapped to the object in model-space, meaning that for us to get a correct displacement. When we’ve done our displacement, then we’d like to move the new vertices into the model-view-projection space so they can be correctly rendered. Then of course, we want to render to our depth-buffer using the view-space position, which means that we not only need to use our displaced version, but we need to multiply that with the view-matrix. Remember that every vertex is already multiplied with the model-matrix from the vertex-shader, so we only need to multiply the final position with view-projection and the view-space position with the view-matrix.

If you’re the observational type, you might think why in earth I’m not displacing the normals to fit the displaced surface. A surface looking like this: _____ which turns into this: _/_ of course has new normals. If you want per-vertex lighting. We still want to use normal-mapping, but why? Well, tessellation is great in a lot of ways, but a normal map will still contain more information than our tessellated surface will. What this means is that a surface doesn’t need to change the normals, because the normals are saved in the normal-map. What we do need however, is the correct TBN-matrix to displace the normals. But we don’t need to manipulate them either, because despite the fact that the surface goes from: ____ to: _/_, the normals sampled from a texture mapped to a surface looking like: ____ will still appear to look like _/_. The conclusion of the confusion is that one doesn’t need to care much about the normals, because the normal-map provides the lighting properties otherwise used on a non-tessellated surface, which means, the normal map will always fit the displaced surface if the artist made them match. This is the result:

What you see is nebula rendering a normally flat surface using a global light, a point light and two spot lights, and a height map to displace the surface. I will admit that I might be babbling with the normal-manipulation because I can sometimes see small artifacts along the edges of the tessellated surfaces where there is a light source present. If this artifact is a result of the lighting, or the fact that I’m ignorant enough to think I can get a way with just displacing the position remains to be seen. The result looks pretty good, and it’s more important to start focusing on getting the rest of Nody easier to use, instead of obsessing over small rarely-occurring light artifacts.

Nody still needs a project-format so one can open up node networks, redesign them, change the settings around, and generate them. Also, Nody needs to be able to recursively traverse over all projects and generate all the shaders, which is very useful when committing newly created shaders to SVN or some other subversion server. I also need to write the mini-edition of Nody to go along with the level editor, which will be used as a way to change shader variables for a model instance.

And when all that is done, Nody will need to be able to communicate with the Nebula runtime to change shaders, blend/depth/rasterizer-states, class linkage, frame shader passes, materials, render targets and multiple render targets without the need to restart either application.

Back to basics

One can get very far away from thinking with the mindset of the common algorithms such as sorting and tree traversal, when working with things such as advanced rendering, new technology and complex APIs. One of these examples is the way Nody traverses the tree, which is something that has haunted me for the entire development of the project. It might seem simple on the surface, and it really is, but the logical solution is often not as simple as the practical. Nody used to traverse the tree with the output node as the starting point, and then just do a breadth-first algorithm to traverse the tree. This worked well until I started working on the hull- and domain-shaders, where one node could have several connections from forks spanning across the node network. The problem would be that a node could be reached way before it should, which in turn put the nodes source code in the shader source code before it was supposed to be there. The image should suffice as an example.

Here you can see the displace-node which is reached from both the normalize-node and the barycentricposition-node. Well, let’s say the tree traverses there from the barycentricposition node first, which effectively puts the displace-node code in the shader before it can do the normalize part. This will cause an error in the final result, which is not acceptable. To address this problem, the tree is traversed by going breadth-first, starting from the vertex node (not the output node), but a node can only be traversed to if and only if all its incoming connections have been handled. This ensures that a node has all dependencies declared before itself, which means the node code will do what it is supposed to. Also, you might wonder why I changed the starting point from the output node to the vertex node, and that is because leaf nodes such as the projectiontransform-node you can see there, has to be traversed as well. You might think “hey, that node has no effect on the end result, it’s unnecessary!”, which would be very much like I thought at first. But what if this node writes to SV_POSITION, but there is no effect node which uses the actual SV_POSITION as an input? How do you generalize a node that has only an input, for a value that ALWAYS has to be written to? Instead, nodes like this will be treated specially, or at least their outputs will if they are writing to a system value. So this node, despite it being a leaf node with no other relation or effect to the final picture, actually still uses its output as if it were connected. Have in mind though that nodes with outputs to system values will be treated as a special case. If one wants to use the SV_POSITION-marked variable, it still works to attach it and use it as normal, and in that case, Nody doesn’t have to treat the output variable in a special way.


If you read my last post, you might already know that the hull shader requires the input and output struct to have their variables in the same order. It is OK for the output struct to be a subset of the input struct, but not vice versa. As a result of this, I thought a simple sorting algorithm will come in handy, seeing as the only thing I need to do is to sort the variables based on what they are connected to. Easy enough to implement using a simple insertion sort , because let’s face it, using a faster sorting algorithm here would give us nothing. So that’s working now, which means that we can construct a working shader using hull and domain shaders with Nody. The above image is a piece of the entire node network that makes up a shader that writes normal-depth using a tessellated surface. The following picture shows the network in it’s entirety.









Tessellation continued

So I’ve been busy trying to make the domain shader nodes work without any post-Nody modification. Thus far, I’ve seen to figure out that the variables has to come in the exact same order in the vertex shader, hull shader, and domain shader. What that means, is that the vertex shader output struct, has to look EXACTLY the same as the hull shader input, the hull shader output has to be identical with the hull shader input, and the domain shader input has to be indentical with the hull shader output. If only one of these conditions are not met, the shader won’t render at all. The only thing that can vary is that the hull shader seems to be able to add single floats to the output struct. The reason to this might be that the hull shader is simply a pass-through stage, which means that it acts as an alias for the vertex shader. By that I mean that the format of the data passed to the domain shader has to be in the same form it’s received by the domain shader, thus giving you the exact same result as if the domain shader itself would directly receive the vertex shader data. Well there is also the possibility to have output struct from a domain shader be a subset of the input struct, thus allowing you to crop data while doing the pass-through.

The solution to this will be to let Nody sort the inputs and outputs of the hull shader, which shouldn’t be too much work seeing as the hull shader simply takes the variable shares the same input and output name. Then when the domain shader is done, the variables has to be sorted to match the output signature of the hull shader.

I shouldn’t really complain, but why is this not documented? This is really important stuff we’re talking about, because if you don’t follow this seemingly invisible pattern, there is no way to know the results. I can imagine that using structs when coding regularly would be a lot easier, seeing as you’d only copy the structs from file 1 to file 2. I’m really be looking forward to the day where you only need to define the outputs as with the out-modifier, and then simply set a value, thus leaving it to the compiler to figure out the structure of the output package. I should have found a general solution to this problem by the beginning of the next week.


Tessellation and DirectX 11

I managed to get the tessellation to work, which doesn’t mean I have anything cool to show, but it is working. You will just have to take my word for it. It turns out that you need to input a special type of primitive type for the shader to be able to treat the input data as control points. This seems unnecessary if you send a triangle, it should be able to interpret a triangle as three control points, but nope, it doesn’t. The funny thing is that the Nvidia GPUs allows you to run the shader like nothing is wrong, except for the fact that you wont see a result of course. That result would automatically send you on a quest to fix the shader, but the actual problem is in your application, and not your shader. The ATI GPUs at least have the decency to tell you that something is fatally wrong by, quite simply, restarting the graphics driver. But enough bashing, it works, I’m happy, and when I get the time, I will be implementing a complete render pass using tessellated objects (by which I mean a normal-depth pass and a color pass).

While I’m already on the subject of hull shaders and domain shaders, I thought I’d share the design for how Nody implements these magical tools of power. The hull shader, very much like the vertex shader, is usually a program which you don’t need to modify too much, and by that I mean there is no real need to have different nodes for different features. The hull shader is also very complex in the sense that it is essentially two threads, the main thread and the patch constant thread. Because of this fact, a hull shader would need to be split into two different nodes, one for the main thread and one for the patch constant thread, and since the program itself is supposed to be user-friendly to some extent, I thought up a different solution. The solution is to treat the hull shader very much like the vertex shader, using one node to do a specific mission. For example, I have a node named ‘static’, which is basically used to pass the position, view-space position and uv-coordinates to the pixel shader. Then I have another for doing this and skinning the object. Well the hull shader works the same way, currently there is one node called trianglesubdivision, which takes information from the vertex node, and performs a constant patch function and a main function described in the shader variation file. So if one is to implement another hull shader, one should consider making another version of trianglesubdivision.

The domain shader however, is oh so very different. This shader step is much like the pixel shader, divided into an arbitrary amount of nodes, where each node performs some sort of action. The only hitch here is that since the domain shader gets a patch in the format OutputPatch<DS_INPUT, POINT_COUNT> Patch, every node accessing a variable from the hull shader needs a Patch[x] in front of the actual variable name. This is also a part where something can go wrong. Let’s say that POINT_COUNT is 3, and a domain shader node will try to access a variable at Patch[4], which isn’t going to work. I have accepted this as a minor flaw, but there are several obvious ways to work around it. First of all, domain shader variations are categorized by what kind of geometry they are expecting. The hull shader denotes that we are using triangles if you are using trianglesubdivision (obvious fact is obvious), and the domain nodes to go with this is of course the ones in the triangle-category. If one chooses to pick from another category, well, then they’ll have to suit themselves when they get an error telling you that you are trying to grab more than you can handle (i.e. trying to get a point which is out of bounds). So for future expansions on this system, add a new hull shader, and a suiting new domain shader category.

So that’s that. I also want to squeeze in how to add new nodes to Nody using the .shv or shader node variation system. Basically, it’s like a scripting language, and it looks something like this:



float4 Position : POSITION




float4 WorldProjectedPosition : SV_POSITION




WorldProjectedPosition -> mul(ModelViewProjection, Position);



It’s basic, it’s simple, and it’s highly maintainable. If this variation is attached to a node, that node will automatically get the input Position, the output WorldProjectedPosition, and perform the ModelViewProjection multiplication to put the position in the viewport. The -> operator is something new, which is not a part of the HLSL or GLSL standard, and is solely used by Nody to allow the user to decide what sort of operator should be used. In the default case, the action of the node is Set, which means that WorldProjectedPosition will be set to mul(ModelViewProjection, Position). If the action is, for example Add, well then WorldProjectedPosition will have mul(ModelViewProjection, Position) added to itself. In this case, the choice of action is pretty obvious, seeing as doing anything else but setting the value would cause a catastrophic result, but let’s say its a color addition, or subtraction, or what have you. I’m guessing you get the point. Oh and by the way, the syntax is very inspired by the OpenGL glBegin() and glEnd() style.

The tags that are currently available are: BEGININPUTS, BEGINOUTPUTS, BEGINSOURCE, BEGINEXTERNALS, BEGININCLUDES, BEGINTEXTURES, BEGINCONSTTEXTURES, BEGINSAMPLERS, BEGINGLOBALS, BEGINCONSTANTS and BEGINCLASSINSTANCES. You’ve already seen inputs, outputs and source, and I’m guessing you can figure out what includes, textures and samplers are. The global is a way to denote variables that will be changeable by the runtime, and constants are variables that are not. The externals denote code that is supposed to be outside of the main loop, but not included in another file, so for example some simple function. Const textures are just a way to make Nody avoid defining a texture, which is useful if the texture is in an include-file. This will still add the texture to the node as an input-variable, but will avoid declaring it in the shader code. Class instances are to be used for dynamic linkage, which in turn will allow us to make small modifications to the shader program, which is useful for not bloating the shader system with tons of different shaders, but instead make small modifications to the existing ones.

World, this is Nody. Nody, this is world

Hello and welcome to the development thread of the gscept game engine and pipeline suite. Seeing as there is no point in stalling, I’m going to explain exactly what we are currently doing.

The core of the engine relies on the Nebula Device, developed by a company previously known as RadonLabs. They’ve developed a, cross-platform and highly maintainable game engine, which supports DirectX 9, job dispatching and content-management using a Maya plugin. The problem is that we here in Skellefteå, have a pre-release of the engine, which basically means that we don’t have any of the cool pipeline tools. So, about 3 months ago, we started making our own pipeline for the engine, and one of these tools is Nody.

Nody is a graphical shader designer. It lets you create nodes of five different types, vertex, subdivision, displacement, geometry and effect nodes. Each node type corresponds to a segment in the rendering pipeline. The idea is to be able to combine different nodes together in order to graphically generate a pipeline pass. Nody is also equipped with two subtools, the frame shader editor, and the material editor. The frame shader editor works as a tool to modify a frame shader, or render path.

The tool has functionality to add render targets, multiple render targets, and frame passes. A frame pass corresponds to a set of shaders, which consists of a vertex shader, a pixel shader, and optional hull shader, domain shader and geometry shader. A frame pass can be used to either render a dynamic model, be it skinned or static, a full-screen quad for post effects, and four specialized for text, GUI, simple shapes and lights. The point with having specialized frame passes is to be able to script passes that needs special treatment. Nebula renders lights using deferred lighting, so all the lights are rendered in a three passes, one for the global light, one for the spot lights and one for the point lights. The lights pass handled this very special pass by denoting it with a lights pass node, so you don’t need to attach a material (I will explain materials later) to every single light for them to render. A simple frame in Nebula renders normals and depth to a GBuffer, renders the lights using the GBuffer, and then renders a model, using the light buffer as the light values.

Previously, Nebula relied on the effect system developed by Microsoft for DirectX, which allows you to basically setup entire render passes using a single file. The effects system allows you to set depth-stencil states, blend states and rasterizer states, as well as which shader goes where in the pass. The bad thing about effects is that Microsoft seem to be slowly moving away from it, and OpenGL doesn’t offer such a feature. Why would I even care about OpenGL? Well, this is how I see it, better make it as general as possible while I’m already in the process of remaking it, rather than keeping to an old library which is strongly discouraged to use because it’s not a part of the standard SDK.

The way Nebula rendered before was using effects and annotations, which is basically yet another way to script what technique should be used when rendering an object. It start the normal-depth pass, and then set the annotations correctly for different objects. So it would render all static objects, and set the annotation to use normal-depth to render to the GBuffer, and then in a later pass, use the exact same batching, but just change the annotation to be used for the current pass. And seeing as we won’t be using effects no more, this will become a huge problem. Enter materials.

Materials are what you attach to an object in order to make it render. The frame shader denotes when a pass should take place, but the material denotes in what pass the object should be rendered. This way, an object can actually have several shaders, like normal-depth, shadow and color. Nody supplies an editor for this as well.

This picture show Nody in its current state.

It shows you a preview to the right (not yet implemented) as well as the main window, together with the frame shader tool and material editor. The material editor and frame shader are two detachable windows.

Currently, I’m working on making the displacement nodes to work in a render. As you might have guessed, the subdivision nodes are used to build a hull shader, and the displacement nodes are used to build a domain shader, both of which are new features in DirectX 11. So yeah, Nebula has been rewritten to not only use ordinary shaders instead of effects, but it’s also been upgraded to support DirectX 11. The funny thing is that my work computer shows a blank screen when rendering, but my home computer halts the driver, just to let me know I’m doing something wrong. Isn’t that so nice!?