If I haven’t told you already, materials handles shaders for an object. Since an object can be rendered with different shaders in different passes, there needs to be some way to describe what shaders, variations and class interfaces should be applied on an object. The solution for this is called a material. Materials can be created with Nody, using an editor resembling the frame shader editor to some extent. The batch rendering still works because materials are attached in the .n3-files, just like shaders used to do. The only problem is that Nebula only handles one shader, which is the one previously defined in the .n3-file. This is the ONLY shader attached to a model which can have dynamic shader variables. Have in mind though that the previous shaders were using the effect system, which meant that several shader programs can be defined in one file, which also includes its shader variables.

Now, when that is not an option anymore, we need to handle that. It might sound like a small problem, but keep in mind that the entire rendering engine is based on having a shader per model node, as well as the lights, and every frame pass. Now we need to make it so that every model node has a material instead, which means that shader variables needs to know in what shader they belong. At first, I thought this would be easy, so I just made a very basic layer which only modified shader variables if you already knew the shader name. I realized that this would be very cumbersome and also very ugly, so I rewrote the materials in Nebula.

A material works like the Nebula shaders, which can be instantiated to allow for object-specific variables. So a material can create a materialinstance, which has materialvariables, which in turn knows of the variable in all of the shaders, so that setting a material variable results in setting the variable for all shader instances. This sweet solution made the code very cleaner, and also allowed me to fix a lot of memory leaks that I struggled with before. So there are no memory leaks either. I also tried the system by having two models, one with a tessellated material and one without, and this is the result:



I also figured out how the actual batching works. When every n3-file is loaded, the visibility resolver checks to see if an n3-scene is visible, and if it is, it will find all visible scenes with a model containing a material, it then checks to find unique model nodes using a specific material, and when it’s done doing that, it finds all instances which is ultimately our models. This ensures that shaders and render states are set once per unique model, and also allows us to render the models without doing much else but setting instance-specific variables such as transformations per draw call. It could also be expanded to using instancing to render static models. Using the principle of having materials with the new material subsystem, it shouldn’t be too much work to just simply add the materials to particles, animators and characters.


So I’ve been looking into how characters are saved in the binary .n3 format. So far I’ve managed to write static meshes with materials and all that fancy stuff, but characters is a completely different story. The characters contains information describing the skins, the joints, the animation resource and the variation resource. The idea is to have a character that can use a lot of different skins for the same skeleton. This way, one can use a single skeleton, and a single animations table, but only using different skins. The character node can also contain an actual mesh, or several, seeing as the genius node-based content system simply handles nodes in order of features.

The nodes work in the way that they represent pieces of data, very much like the ones you find in Maya. A node implements a set of tags, which contains different data. For example, the StateNode looks for tags concerning the state of a model, for example its material (or in the old days, the shader), shader variables and textures. It also has the ability to pass a tag it doesn’t recognize further down the inheritance tree, where the base-level node is the TransformNode. The problem I encountered first was that every node, including the specialized nodes such as the ParticleSystemNode and the CharacterSkinNode both inherited from StateNode, which in turn was completely based on a system where an object could only have one shader. Seeing as materials represent an object with several shaders, the old StateNode had to be replaced, but at the same time still be compatible with the old system.

To address this issue, one can use the same pattern used for the DirectX rendering stuff, where there would be a base class which all classes inherit, and in the header of the used class, in this case the StateNode class, there is a series of defines which determines what class StateNode should inherit from. So for the old system, StateNode would inherit the old StateNode which has been renamed to SimpleStateNode. For the new system, StateNode inherits from MaterialStateNode. This way, when compiled using materials, Nebula will use the material system, and when it’s not, it will revert back to the old system. I should note that a model not using a material when loaded with Nebula using materials, will tell you what you did wrong, by crashing. The only bad thing is that the functions for applying a state, getting, creating and looking for the existence of a shader variable now needs a shader instance, for both the SimpleStateNode, and the MaterialStateNode. The only thing is that for the SimpleStateNode, the shader instance does absolutely nothing, which in my mind is bad. Have in mind that the old system is just an intermediate system, and can easily be converted to use materials with DirectX 9, but it still feels so very wrong to have that extra pointer.

Another solution is to branch it, by simply having a new set of nodes for shapes using materials, and for states using materials. This also means that one has to implement a new set of nodes for the characters, and the particles as well, seeing as they rely on the functionality of the state nodes. The beauty of this solution is to quite simply have both systems running simultaneously, and instead be dependent on the content. This of course needs us to create different types of shape nodes when we export our models. For example, using the original system would require us to use a ShapeNode, and using the new system would require a MaterialShapeNode. The problem with creating the wrong node here, is that the ShapeNode wont be able to recognize the “MNMT” or ModelNodeMaterial tag, while the MaterialShapeNode will. Also, the ShapeNode will not understand why a shader isn’t created by the SHDR tag which won’t exist when using a MaterialShapeNode. The MaterialShapeNode will not work with the ShapeNode for the reason that there is no material specified for a ShapeNode. So the result would be much nicer code, but it will crash when used incorrect.

I chose to go with the latter.

While investigating the nodes I found a node called AnimatorNode. For some reason, it was commented as “Legacy n2 crap!”, but I couldn’t find a replacement for it. The AnimatorNode handles keyframed variables, and just like our StateNode, it uses a single shader to do so. Research away!


The Finnish army knife

Qt! Qt is so extremely flexible that it almost makes me teary eyed. I thought saving a project with all node positions, node links, node constant variable settings, link variable connections, effect settings, tessellation settings, geometry settings and variations would be extremely complex. The fact is that a save file containing that information will probably take up less space and take less time to implement than this blog post. I’ve devised a very simple system, consisting of two singletons, a loader and a saver, and an interface called Savable. The idea is that every class that should be savable needs to inherit the appropriately named Savable. Seeing as Savable is abstract, one has to implement its three functions, Save, Load and Reset. What the functions do is pretty obvious, Save basically writes the necessary data to a stream, Load does the inverse, and Reset resets a class to it’s default state.

A binary file will have all its information in order, because the order of saving  follows a specific pattern. But in order to maintain the flexibility of adding new classes to be saved, I’ve devised a very deviant system. It’s pretty simple in fact, a class saves itself using its class name, and its data. The saver then first writes the class, and then the data for that class. Remember that data doesn’t need to be stored in this manner, only the top-level item does. What I mean by this is that the node scene, which is one of these top-level items, store all the links and nodes internally, which means that a load will read all the nodes and all the links. However, in order to avoid needing the data saved in perfect order, the loader automatically maps all classes that are supposed to be loaded by supplying to the loader what class you want to load, and what class name is supposed to be mapped to that class. Then, when the loader hits the class name in the file, it automatically calls the correct class with the stream at the start of that class. This basically means that classes can be loaded from a save file, even though the load order differs from the save order. The top-level items also have to be unique, either by being singletons or by being simple single instances, however, they can contain several objects of different classes.

The savable class also implements a function called ClassName, which uses C++ RTTI to perform a class lookup, then converts it to a QString, and removes the class/struct/union-part which type_info contains. This way, the ProjectLoader always knows how to handle saving and loading an object if and only if they implement Savable.

Fantastic! One can now not only create shaders completely dynamic, but also save them, open them, modify them, and save them again! I estimated at least half a month to getting this to work, but in the end, it took more like 2 and a half days. What’s left for Nody now is Nody-Nebula communication, and the about page. If I have spare time left, I was thinking of adding a new type of node, Custom node. This type of node allows for real-time creation and saving of a node using Nody, and not some text editor. It would have syntax coloring as well. Perhaps the easiest place to begin is to be able to modify the generated source code for the shader. One can argue the flexibility of letting Nody do this, or by simply having your favorite text editor at the ready. Projects can of course still be opened and compiled without the need of traversing the graph and/or saving the project again.

I was wrong

I just wanted to say that I was wrong when it came to the hull shader with the inputs and outputs that has to match. Well, it turns out that I shot myself in the foot. The reason why it didn’t work is because all variables that should be shared globally, and with that I mean things like the orientation, projection and view-matrices, were stored in a cbuffer called Globals. Well, if you then declare another variable that is not in a cbuffer, the shader compiler automatically generates a cbuffer for you, storing all those rogue variables. This cbuffer is called $Globals. Nebula currently just handles one single cbuffer, because variables are treated as if they were to be updated once per frame. If you were to design a program like Nody, where every shader should have a variable that is supposed to be interchangeable from the CPU to the GPU, then you have to put the variable in some sort of global scope, which then becomes the $Globals cbuffer. Also, seeing as the constant buffers are constantly changed due to the fact that each object needs its own set of variables, there is no real reason to bundle them together.

Yes, one might argue that variables which are not used in one shader takes up unnecessary space, and while that is true, there is complete control to design a shader which only contains the variables required. This basically takes that optimization out of my hands and place it into the hands of anyone using the program for their game engine.

But I wanted to say this to keep the record straight. I was wrong, there is no undocumented internal voodoo required to make the hull shader work, it was my mistake. Seeing this from the bright side of course means that making very complex hull and domain shaders are now even easier than before, and as a result of this, I thought I’d recreate the trianglesubdivision hull shader to scale the amount of tessellation based on distance to the camera. This literally means that the detail in the object is dynamically scaled when approaching or retreating. It would also be really nice to try tessellating a surface using a real height map, instead of using a diffuse map as I am now. But I’m probably going to focus on other stuff for a while.

Deferred lighting and tessellation

If you haven’t noticed yet, I’ve been trying to not only make hull and domain shaders work, but also make them work in a practical sense, by which I mean having relevant nodes in Nody that allows a user to create a shader that uses tessellation without any hassle. This of course has to be interlaced with the deferred lighting, where the normals and depth has to be written using a tessellated surface. Sounds easy enough, but consider the fact that you need to divide your calculations for the position of a vertex and the view space position. Why? Well if you want to perform displacement, then it’s going to be rather hard if the vertex is already in projected into the scene. Instead, you want it to be in model-space (rotated by the object rotation), seeing as the heightmap is mapped to the object in model-space, meaning that for us to get a correct displacement. When we’ve done our displacement, then we’d like to move the new vertices into the model-view-projection space so they can be correctly rendered. Then of course, we want to render to our depth-buffer using the view-space position, which means that we not only need to use our displaced version, but we need to multiply that with the view-matrix. Remember that every vertex is already multiplied with the model-matrix from the vertex-shader, so we only need to multiply the final position with view-projection and the view-space position with the view-matrix.

If you’re the observational type, you might think why in earth I’m not displacing the normals to fit the displaced surface. A surface looking like this: _____ which turns into this: _/_ of course has new normals. If you want per-vertex lighting. We still want to use normal-mapping, but why? Well, tessellation is great in a lot of ways, but a normal map will still contain more information than our tessellated surface will. What this means is that a surface doesn’t need to change the normals, because the normals are saved in the normal-map. What we do need however, is the correct TBN-matrix to displace the normals. But we don’t need to manipulate them either, because despite the fact that the surface goes from: ____ to: _/_, the normals sampled from a texture mapped to a surface looking like: ____ will still appear to look like _/_. The conclusion of the confusion is that one doesn’t need to care much about the normals, because the normal-map provides the lighting properties otherwise used on a non-tessellated surface, which means, the normal map will always fit the displaced surface if the artist made them match. This is the result:

What you see is nebula rendering a normally flat surface using a global light, a point light and two spot lights, and a height map to displace the surface. I will admit that I might be babbling with the normal-manipulation because I can sometimes see small artifacts along the edges of the tessellated surfaces where there is a light source present. If this artifact is a result of the lighting, or the fact that I’m ignorant enough to think I can get a way with just displacing the position remains to be seen. The result looks pretty good, and it’s more important to start focusing on getting the rest of Nody easier to use, instead of obsessing over small rarely-occurring light artifacts.

Nody still needs a project-format so one can open up node networks, redesign them, change the settings around, and generate them. Also, Nody needs to be able to recursively traverse over all projects and generate all the shaders, which is very useful when committing newly created shaders to SVN or some other subversion server. I also need to write the mini-edition of Nody to go along with the level editor, which will be used as a way to change shader variables for a model instance.

And when all that is done, Nody will need to be able to communicate with the Nebula runtime to change shaders, blend/depth/rasterizer-states, class linkage, frame shader passes, materials, render targets and multiple render targets without the need to restart either application.

Back to basics

One can get very far away from thinking with the mindset of the common algorithms such as sorting and tree traversal, when working with things such as advanced rendering, new technology and complex APIs. One of these examples is the way Nody traverses the tree, which is something that has haunted me for the entire development of the project. It might seem simple on the surface, and it really is, but the logical solution is often not as simple as the practical. Nody used to traverse the tree with the output node as the starting point, and then just do a breadth-first algorithm to traverse the tree. This worked well until I started working on the hull- and domain-shaders, where one node could have several connections from forks spanning across the node network. The problem would be that a node could be reached way before it should, which in turn put the nodes source code in the shader source code before it was supposed to be there. The image should suffice as an example.

Here you can see the displace-node which is reached from both the normalize-node and the barycentricposition-node. Well, let’s say the tree traverses there from the barycentricposition node first, which effectively puts the displace-node code in the shader before it can do the normalize part. This will cause an error in the final result, which is not acceptable. To address this problem, the tree is traversed by going breadth-first, starting from the vertex node (not the output node), but a node can only be traversed to if and only if all its incoming connections have been handled. This ensures that a node has all dependencies declared before itself, which means the node code will do what it is supposed to. Also, you might wonder why I changed the starting point from the output node to the vertex node, and that is because leaf nodes such as the projectiontransform-node you can see there, has to be traversed as well. You might think “hey, that node has no effect on the end result, it’s unnecessary!”, which would be very much like I thought at first. But what if this node writes to SV_POSITION, but there is no effect node which uses the actual SV_POSITION as an input? How do you generalize a node that has only an input, for a value that ALWAYS has to be written to? Instead, nodes like this will be treated specially, or at least their outputs will if they are writing to a system value. So this node, despite it being a leaf node with no other relation or effect to the final picture, actually still uses its output as if it were connected. Have in mind though that nodes with outputs to system values will be treated as a special case. If one wants to use the SV_POSITION-marked variable, it still works to attach it and use it as normal, and in that case, Nody doesn’t have to treat the output variable in a special way.


If you read my last post, you might already know that the hull shader requires the input and output struct to have their variables in the same order. It is OK for the output struct to be a subset of the input struct, but not vice versa. As a result of this, I thought a simple sorting algorithm will come in handy, seeing as the only thing I need to do is to sort the variables based on what they are connected to. Easy enough to implement using a simple insertion sort , because let’s face it, using a faster sorting algorithm here would give us nothing. So that’s working now, which means that we can construct a working shader using hull and domain shaders with Nody. The above image is a piece of the entire node network that makes up a shader that writes normal-depth using a tessellated surface. The following picture shows the network in it’s entirety.









Tessellation continued

So I’ve been busy trying to make the domain shader nodes work without any post-Nody modification. Thus far, I’ve seen to figure out that the variables has to come in the exact same order in the vertex shader, hull shader, and domain shader. What that means, is that the vertex shader output struct, has to look EXACTLY the same as the hull shader input, the hull shader output has to be identical with the hull shader input, and the domain shader input has to be indentical with the hull shader output. If only one of these conditions are not met, the shader won’t render at all. The only thing that can vary is that the hull shader seems to be able to add single floats to the output struct. The reason to this might be that the hull shader is simply a pass-through stage, which means that it acts as an alias for the vertex shader. By that I mean that the format of the data passed to the domain shader has to be in the same form it’s received by the domain shader, thus giving you the exact same result as if the domain shader itself would directly receive the vertex shader data. Well there is also the possibility to have output struct from a domain shader be a subset of the input struct, thus allowing you to crop data while doing the pass-through.

The solution to this will be to let Nody sort the inputs and outputs of the hull shader, which shouldn’t be too much work seeing as the hull shader simply takes the variable shares the same input and output name. Then when the domain shader is done, the variables has to be sorted to match the output signature of the hull shader.

I shouldn’t really complain, but why is this not documented? This is really important stuff we’re talking about, because if you don’t follow this seemingly invisible pattern, there is no way to know the results. I can imagine that using structs when coding regularly would be a lot easier, seeing as you’d only copy the structs from file 1 to file 2. I’m really be looking forward to the day where you only need to define the outputs as with the out-modifier, and then simply set a value, thus leaving it to the compiler to figure out the structure of the output package. I should have found a general solution to this problem by the beginning of the next week.


Tessellation and DirectX 11

I managed to get the tessellation to work, which doesn’t mean I have anything cool to show, but it is working. You will just have to take my word for it. It turns out that you need to input a special type of primitive type for the shader to be able to treat the input data as control points. This seems unnecessary if you send a triangle, it should be able to interpret a triangle as three control points, but nope, it doesn’t. The funny thing is that the Nvidia GPUs allows you to run the shader like nothing is wrong, except for the fact that you wont see a result of course. That result would automatically send you on a quest to fix the shader, but the actual problem is in your application, and not your shader. The ATI GPUs at least have the decency to tell you that something is fatally wrong by, quite simply, restarting the graphics driver. But enough bashing, it works, I’m happy, and when I get the time, I will be implementing a complete render pass using tessellated objects (by which I mean a normal-depth pass and a color pass).

While I’m already on the subject of hull shaders and domain shaders, I thought I’d share the design for how Nody implements these magical tools of power. The hull shader, very much like the vertex shader, is usually a program which you don’t need to modify too much, and by that I mean there is no real need to have different nodes for different features. The hull shader is also very complex in the sense that it is essentially two threads, the main thread and the patch constant thread. Because of this fact, a hull shader would need to be split into two different nodes, one for the main thread and one for the patch constant thread, and since the program itself is supposed to be user-friendly to some extent, I thought up a different solution. The solution is to treat the hull shader very much like the vertex shader, using one node to do a specific mission. For example, I have a node named ‘static’, which is basically used to pass the position, view-space position and uv-coordinates to the pixel shader. Then I have another for doing this and skinning the object. Well the hull shader works the same way, currently there is one node called trianglesubdivision, which takes information from the vertex node, and performs a constant patch function and a main function described in the shader variation file. So if one is to implement another hull shader, one should consider making another version of trianglesubdivision.

The domain shader however, is oh so very different. This shader step is much like the pixel shader, divided into an arbitrary amount of nodes, where each node performs some sort of action. The only hitch here is that since the domain shader gets a patch in the format OutputPatch<DS_INPUT, POINT_COUNT> Patch, every node accessing a variable from the hull shader needs a Patch[x] in front of the actual variable name. This is also a part where something can go wrong. Let’s say that POINT_COUNT is 3, and a domain shader node will try to access a variable at Patch[4], which isn’t going to work. I have accepted this as a minor flaw, but there are several obvious ways to work around it. First of all, domain shader variations are categorized by what kind of geometry they are expecting. The hull shader denotes that we are using triangles if you are using trianglesubdivision (obvious fact is obvious), and the domain nodes to go with this is of course the ones in the triangle-category. If one chooses to pick from another category, well, then they’ll have to suit themselves when they get an error telling you that you are trying to grab more than you can handle (i.e. trying to get a point which is out of bounds). So for future expansions on this system, add a new hull shader, and a suiting new domain shader category.

So that’s that. I also want to squeeze in how to add new nodes to Nody using the .shv or shader node variation system. Basically, it’s like a scripting language, and it looks something like this:



float4 Position : POSITION




float4 WorldProjectedPosition : SV_POSITION




WorldProjectedPosition -> mul(ModelViewProjection, Position);



It’s basic, it’s simple, and it’s highly maintainable. If this variation is attached to a node, that node will automatically get the input Position, the output WorldProjectedPosition, and perform the ModelViewProjection multiplication to put the position in the viewport. The -> operator is something new, which is not a part of the HLSL or GLSL standard, and is solely used by Nody to allow the user to decide what sort of operator should be used. In the default case, the action of the node is Set, which means that WorldProjectedPosition will be set to mul(ModelViewProjection, Position). If the action is, for example Add, well then WorldProjectedPosition will have mul(ModelViewProjection, Position) added to itself. In this case, the choice of action is pretty obvious, seeing as doing anything else but setting the value would cause a catastrophic result, but let’s say its a color addition, or subtraction, or what have you. I’m guessing you get the point. Oh and by the way, the syntax is very inspired by the OpenGL glBegin() and glEnd() style.

The tags that are currently available are: BEGININPUTS, BEGINOUTPUTS, BEGINSOURCE, BEGINEXTERNALS, BEGININCLUDES, BEGINTEXTURES, BEGINCONSTTEXTURES, BEGINSAMPLERS, BEGINGLOBALS, BEGINCONSTANTS and BEGINCLASSINSTANCES. You’ve already seen inputs, outputs and source, and I’m guessing you can figure out what includes, textures and samplers are. The global is a way to denote variables that will be changeable by the runtime, and constants are variables that are not. The externals denote code that is supposed to be outside of the main loop, but not included in another file, so for example some simple function. Const textures are just a way to make Nody avoid defining a texture, which is useful if the texture is in an include-file. This will still add the texture to the node as an input-variable, but will avoid declaring it in the shader code. Class instances are to be used for dynamic linkage, which in turn will allow us to make small modifications to the shader program, which is useful for not bloating the shader system with tons of different shaders, but instead make small modifications to the existing ones.

World, this is Nody. Nody, this is world

Hello and welcome to the development thread of the gscept game engine and pipeline suite. Seeing as there is no point in stalling, I’m going to explain exactly what we are currently doing.

The core of the engine relies on the Nebula Device, developed by a company previously known as RadonLabs. They’ve developed a, cross-platform and highly maintainable game engine, which supports DirectX 9, job dispatching and content-management using a Maya plugin. The problem is that we here in Skellefteå, have a pre-release of the engine, which basically means that we don’t have any of the cool pipeline tools. So, about 3 months ago, we started making our own pipeline for the engine, and one of these tools is Nody.

Nody is a graphical shader designer. It lets you create nodes of five different types, vertex, subdivision, displacement, geometry and effect nodes. Each node type corresponds to a segment in the rendering pipeline. The idea is to be able to combine different nodes together in order to graphically generate a pipeline pass. Nody is also equipped with two subtools, the frame shader editor, and the material editor. The frame shader editor works as a tool to modify a frame shader, or render path.

The tool has functionality to add render targets, multiple render targets, and frame passes. A frame pass corresponds to a set of shaders, which consists of a vertex shader, a pixel shader, and optional hull shader, domain shader and geometry shader. A frame pass can be used to either render a dynamic model, be it skinned or static, a full-screen quad for post effects, and four specialized for text, GUI, simple shapes and lights. The point with having specialized frame passes is to be able to script passes that needs special treatment. Nebula renders lights using deferred lighting, so all the lights are rendered in a three passes, one for the global light, one for the spot lights and one for the point lights. The lights pass handled this very special pass by denoting it with a lights pass node, so you don’t need to attach a material (I will explain materials later) to every single light for them to render. A simple frame in Nebula renders normals and depth to a GBuffer, renders the lights using the GBuffer, and then renders a model, using the light buffer as the light values.

Previously, Nebula relied on the effect system developed by Microsoft for DirectX, which allows you to basically setup entire render passes using a single file. The effects system allows you to set depth-stencil states, blend states and rasterizer states, as well as which shader goes where in the pass. The bad thing about effects is that Microsoft seem to be slowly moving away from it, and OpenGL doesn’t offer such a feature. Why would I even care about OpenGL? Well, this is how I see it, better make it as general as possible while I’m already in the process of remaking it, rather than keeping to an old library which is strongly discouraged to use because it’s not a part of the standard SDK.

The way Nebula rendered before was using effects and annotations, which is basically yet another way to script what technique should be used when rendering an object. It start the normal-depth pass, and then set the annotations correctly for different objects. So it would render all static objects, and set the annotation to use normal-depth to render to the GBuffer, and then in a later pass, use the exact same batching, but just change the annotation to be used for the current pass. And seeing as we won’t be using effects no more, this will become a huge problem. Enter materials.

Materials are what you attach to an object in order to make it render. The frame shader denotes when a pass should take place, but the material denotes in what pass the object should be rendered. This way, an object can actually have several shaders, like normal-depth, shadow and color. Nody supplies an editor for this as well.

This picture show Nody in its current state.

It shows you a preview to the right (not yet implemented) as well as the main window, together with the frame shader tool and material editor. The material editor and frame shader are two detachable windows.

Currently, I’m working on making the displacement nodes to work in a render. As you might have guessed, the subdivision nodes are used to build a hull shader, and the displacement nodes are used to build a domain shader, both of which are new features in DirectX 11. So yeah, Nebula has been rewritten to not only use ordinary shaders instead of effects, but it’s also been upgraded to support DirectX 11. The funny thing is that my work computer shows a blank screen when rendering, but my home computer halts the driver, just to let me know I’m doing something wrong. Isn’t that so nice!?

Skip to toolbar