Subroutines and suboptimal solutions
I managed to remove the render thread, getting rid of the client/server side object structure which was vital in order to keep the rendering in its own thread. What happened then was that we gained a significant boost in performance, probably due to the fact that the overhead required for syncing was greater than the performance we gained.
I’ve then been investigating on how to extend AnyFX to supply support for some of the stuff I left out in version 1.0, namely shader storage buffers (or RWBuffer in DirectX) and dynamically linked shader functions. The work is currently ongoing, however the syntax for the new buffer and dynamically linked functions are already inplace. Behold!
// declare subroutine 'interface' prototype vec4 colorMethod(vec2 UV); // declare implementation of colorMethod which produces static color subroutine (colorMethod) vec4 staticColorMethod(vec2 UV) { return vec4(1); } // declare implementation of colorMethod which produces textured color subroutine (colorMethod) vec4 texturedColorMethod(vec2 UV) { return texture(SomeSampler, UV); } colorMethod dynamicColorMethodVariable;
The dynamicColorMethodVariable then works as a special variable, meaning there is no way to change it using the AnyFX API. The syntax for defining a program previously looked something like this:
program SomeProgram { vs = SomeVertexShader(); }
However, the shader binding syntax now accept arguments to the shader which is in the form of subroutine bindings. For example:
program SomeProgram { vs = SomeVertexShader(dynamicColorMethodVariable = texturedColorMethod); }
This would bind the vertex shader SomeVertexShader and bind the dynamicColorMethodVariable subroutine variable to be the one that uses a texture. This allows us to create programs which are just marginally different from other programs, and allows us to perform an ‘incremental’ change of the program state, compared to exchanging the whole program object each time we want some variation. The only problem is that Nebula doesn’t really have any concept of knowing whether an incremental shader change is possible or not.
So here comes yet another interesting concept, what if we were to sort materials (which are already sorted based on batch) by variation? Consider the following illustration:
FlatGeometryLit - |--- Static |--- VertexColor |--- Multilayered |--- LightmappedLit |--- Shiny |--- Organic |--- Animated |--- Skinned |--- Skin |--- SkinnedShiny |--- Foliage
This is the order in which they will be rendered, however they are all opaque geometry, so they might render in any order within this list. However, the change between lets say Static, Shiny and Animated is actually not that much, just a couple of lines of shader code. There is no linkage difference between shaders, and they can as such use the same shader, but with different subroutine sets! If we were to sort this list based on ‘change’, we would probably end up with something like this:
FlatGeometryLit - |--- Static |--- Shiny |--- Animated |--- Foliage |--- Organic |--- VertexColor |--- Multilayered |--- LightmappedLit |--- Skinned |--- Skin |--- SkinnedShiny
This is because most of these shaders share the same number of vertex shader inputs, or pixel shader outputs. However, if we simply implement the shaders to have equal functions, then AnyFX could figure out which programs are duplicates of others, and then simply tell us which material should actually apply its program, and which materials are sub dominant and thus only requires an incremental update. What we will end up with, is a sorted list of materials, where the first ‘unique’ material will be dominant, and the others will be incremental. The list would look like this:
FlatGeometryLit - |--- Static -- dominant |--- Shiny -- incremental |--- Animated -- incremental |--- Foliage -- incremental |--- Organic -- incremental |--- VertexColor -- dominant (introduces vertex colors in vertex layout, cannot be subroutined) |--- Multilayered -- incremental |--- LightmappedLit -- dominant (introduces secondary UV set in vertex layout, cannot be subroutined) |--- Skinned -- dominant (introduces skin weights and joint indices in vertex layout, cannot be subroutined) |--- SkinnedShiny -- incremental |--- Skin -- incremental
As we can see here, every time we encounter a recessive material, we can simply perform a smaller update rather than set the entire shader program, which will probably spare us some performance if we have lots of variation in materials. This table only shows the base materials for a specific batch. However, the algorithm would sort all batches by this manner in order to make the entire pipeline reduce it’s API heavy calls. This is probably not a performance issue right now, seeing as we have a rather small set of materials per batch type, however, consider a game with lots of custom made shader derivatives. Currently, these derivatives would more or less have a copy of the shader code of some base shader, and then apply the program prior to each group of objects with that shader.
The next thing to tackle on the list is getting shader storage blocks working. The syntax for these babies are also already defined, but are only implemented by stubs. The shader storage block counterpart of AnyFX is called varbuffer. As opposed to varblock, the varbuffer allows for application control of the internal buffer storage, meaning we can retrieve its handle and read/write data from it as we please. We can also attach the buffer to some other part of the pipeline, which requires information that resides outside the scope of AnyFX. Also, varbuffers supports member arrays with indetermined size! As such, a varbuffer will have some way of allocating a buffer with a dimension set from the application side. Consider the following:
struct ObjectVariables { float MatEmissiveIntensity; float MatSpecularIntensity; mat4 Model; }; varbuffer SomeBuffer { ObjectVariables vars[]; };
This creates a buffer which contains variables per object rendered. We can then from the AnyFX API tell the varbuffer to allocate a backend buffer with a size, which can then be used to for example perform bulk rendering with full per-object variation using glMultiDraw*. The only issue with this is that AnyFX usually handles variables as objects which one can retrieve and simply set, but in this case, a variable would be inside a struct of an array type, and is thus not something which is publicly available. However, we can solve the same problem using the already existing varblock syntax with just a set of arrays of variables with a fixed size. However, shader storage blocks (varbuffer) have a much bigger minimum implementation size, 16MB compared to the one defined for uniform blocks (varblock) which is 16KB, meaning we cannot have as much data per multi draw as we can with varbuffers.
This is totally worth looking into, seeing as it would enable a much faster (probably) execution rate of draw calls seeing as we can pack probably every single object with the same shader in the scene into one single glMultiDraw*, however it will probably not work with the current implementation of using a variable to set a value in, but will need some code which gathers up a bunch of objects and their variables, packs them into a buffer, and then renders everything. More on that when the subroutines are working!
// Gustav