The transform of moving the render state stuff out of the model and into its own resource has been done, and it turns out that it works. It’s concept is similar to what the previous post touched upon, although with some slight modifications.
For example, my initial idea was to apply a surface material ONCE per shader, and then iterate each submesh and render them. However, it turns out that a submesh with the same surface material properties might actually need to use other shaders, for example, we might want to do instanced rendering, which picks a model matrix from an array using gl_InstanceID and occasionally non-instanced rendering, which uses the singular model matrix. One can consider that we always perform instanced rendering, i.e. that we always consider an array of per-object variables, although there is no semantic way in the shader language to denote (and contextually shouldn’t be) variables or uniform buffer blocks to have a per-draw call changing pattern. What we don’t want is to have to implement a version of each surface which uses instancing. An optional solution is to not use instancing for ordinary objects, but exclusively for particles, grass or other frequently updated (massively) repeated stuff.
Instead, it might be smarter to adapt the DX11 way which is that every single variable is a part of a buffer block, where variables declared outside of an explicit buffer block is just considered to be a part of the global block. This method will always require an entire block update, even if only one variable has changed, although using the new material system there is a flexible way around this.
Instead of working on a per-object basis, where each object will literally set their state for each draw, it’s more attractive to use shader storage blocks for variables shared, and guaranteed to be unique per object, such as model matrix, object id etc. Each surface material will provide a single uniform buffer block which is updated ONCE when the surface is created, and modified if a surface gets any constants changed (which is rare during runtime).
When rendering, we simply only need to apply the shader, bind the surface block, then per each object bind their unique blocks, and we’re done with the variable updates. Or are we?
A problem comes from the fact that we might want to animate some shader variable, for example the alpha blend value, which forces us to update an otherwise shared uniform buffer block. Unfortunately, this means we not only have to modify the block to use the new alpha value, but we also need to modify it back to its default state once we don’t want the change anymore.
So one idea is to have changeable values in a transient uniform buffer block (which uses persistent mapping for performance), have one or several material blocks (one for foliage stuff, one for water stuff, one for ordinary objects, etc.) and have a shader storage block for variables which are guaranteed to be changed on an object basis. Why shader storage block you might ask? Well, if we can buffer all of our transforms into one huge array, we can retain a global dictionary of their variables, which can then be accessed in the shader.
By doing this we can minimize the need to set the same transforms twice for the same object, for example when doing shadow mapping (which is three passes: global, spot, point) the default shading, and then perhaps some extra shaders. A shader storage block is flexible in this context, because it can hold a dynamic count of instances. This method basically approaches instancing, because with this method we could also utilize the MultiDraw calls, since we have prepared our state prior to rendering, and all sub meshes already have their unique stuff posted in the shader.
My concern with this is that we might have the case where not all sub meshes share the same surface material. This is the case if we have a composite mesh, where some part uses for example subsurface scattering, and the rest is glass, or cloth, or some other type of material. The issue is basically that we have a very small subset of cases where we win any performance, since we need this particular scenario: A mesh where all parts share the same surface material and shaders.
The issue related to clever drawing methods always fall back on the issue of addressing variables if you’re drawing more than one object. Storing all variables in per-object buffers is trivial, however doesn’t allow us to use any of the fancy APIs like MultiDraw. Storing all variables in global buffers is difficult, because we need to need to define a set of global buffers (with hard-coded names) which is explicitly updated on an index representing the object. This poses a problem if we have several parameter sets, for example PBR, Foliage, UV animation and then try to somehow use 3 global buffers, because we must also have another buffer which denotes the objects id into this buffer (unless we can use gl_DrawID).
I think the best way might be to do something like this:
Surface material blocks are retrieved by looking at the surface variables, extract their blocks and create a buffer for them which is persistently mapped. Objects which needs to modify singular values do so by writing to the persistently mapped buffer directly, which will make the value active at the next render. The issue with this is that if we perform some type of buffering method, we can only have as many buffer changes before we need a sync. Basically, if we have a triple buffering method, we need to wait every third change because the previous change might not have been drawn yet.
We have one transient buffer which is the one changing every frame (time, view matrix, wind direction, global light stuff), and finally a buffer per each submesh which contain its transforms and object ID. So to summarize:
- Surface material contains buffers for each buffer they are ‘touching’. Mapped for rare changes (triple buffered).
- Transient frame buffer like the one we have now, but with more of the common variables. Mapped for changes (triple buffered).
- Per-object transform buffer. Mapped for changes (triple buffered).
AnyFX should be made so that variables which lay outside a varblock (uniform buffer block/constant buffer) is just put in a global varblock with a reserved name. So AnyFX should probably turn over the responsibility of handling uniform buffers to the engine, and not manage it intrinsically. As such, a variable which lies in a block will just assert because setting a variable inside of a block will have no purpose or result. Instead we can set the buffer which should be used in the varblocks by sending an OpenGL buffer handle to the VarBlock implementation. This will also make it simpler to integrate the API with future APIs which forces explicit control (read Vulkan, DX11+) of buffers, but still wraps the shader part in a comfortable and flexible manner.
In essence, we should minimize the buffer changes, and we’re not restricted to the current system where we poke inside a gigantic transform buffer for each time we draw. The current method suffers from the issue that we have to sync every N’th object, and we update the same object several times for each time we draw, which might be 6 times per frame (!). The new method would sync every N’th frame instead, which should scale better when the same object gets rendered multiple times.