Why multidraw is inflexible.

I’ve been working more on the performance stuff in Nebula. In the current version, I added a function to AnyFX in which one can determine if a uniform buffer should be synchronized or not. While this may result in corruptions during rendering due to stomping data in transit, it will also give an enormous boost in performance. One easy way to work around the stomping is by increasing the buffer size so that one can render at least one frame before needing to restart writing to the buffer. In the current state, we have all per-object data in a uniform buffer holding 8192 (a massive amount) of sub-buffers. This gives us the ability to draw (!) 8192 draws each frame without destroying the data, which should be sufficient. If one needs to have synchronizing buffers, then one can just remove the ‘nosync’ qualifier from the varblock. Anyway.

In my quest I also checked out glMultiDraw*, which is nice in principle but there are many things that bother me. To start with, using glMultiDraw, we are only really able to do slightly more flexible instanced rendering. Yes, there is the gl_DrawIDARB GLSL shader variable which lets us address variables in a shader based on which one of the multiple draws we have, and yes this is very simple to implement. But! And this is one major but. It is very non-intuitive to put every per-object variable in arrays. While we can define unsized arrays and use shader storage buffers which are awesome, it requires the engine to, in its core, redefine how it sets variables. For example, some variables has to be set indexed in an array, and some may be set as individual uniforms. Well, we can’t really do any individual uniforms anymore, so everything has to be in arrays. This will cause some major confusion since both are syntactically allowed, however their behavior is completely different. To illustrate, this is how most rendering pipelines do in my experience:

for each material
{
  ApplyMaterial()                         // apply material by setting shaders, tessellation patch size, subroutines
  ApplyMaterialConstants()                // shared variables like time, random value and whatnot
  for each visible model using material
  {
    ApplyModel()                          // set vertex buffer, index buffer
    for each visible model instance using material
    {
      ApplyObjectConstants()               // apply per-object variables, model transform, color etc
      Draw()
    }
  }
}

Which allows us to apply some variables for each draw. Now, if we use the multi draw principle, we will instead get this flow which is better in terms of overhead.

for each material
{
  ApplyMaterial()                         // apply material by setting shaders, tessellation patch size, subroutines
  ApplyMaterialConstants()                // shared variables like time, random value and whatnot
  for each visible model using material
  {
    ApplyModel()                          // set vertex buffer, index buffer
    for each visible model instance using material
    {
      ApplyIndexedObjectConstants()               // apply per-object variables, model transform, color etc
      AccumulateDrawData()                       // add draw data to buffer, i.e. base vertex, base index, number of instances etc.
    }
    MultiDraw()
  }
}

Now, if we have a uniform which isn’t in an array (because a developer, graphics artist(!!!) or shader generation tool may have created a variable using the ordinary way) then we will only get the variable from the last call in the list. It might be specially difficult for a shader generation tool to know whether or not to make an ordinary uniform or to group it in a shader block. Even though this is a user error, it still makes more sense to use ordinary variables as before. The way one would implement this in GLSL to be failsafe would be like this:

struct PerObject
{
  mat4 Model;
  mat4 InvModel;
  vec4 Diffuse;
  int ObjectId;
};

layout(std430) buffer PerObjectBuffer
{
  PerObject perObjectData[];
};

Then setting the variables in an indexed fashion. The problem here lies in the fact that the uniforms will not be visible to the user. Instead, we have to reflect this buffer in the engine code so that we have a matching buffer, and then update into that. The above code will not be as simple as setting the value in a variable, which is one of the major downsides. Another way of doing the same thing is like this:

layout(std430) buffer PerObjectBuffer
{
  mat4 Model[];
  mat4 InvModel[];
  vec4 Diffuse[];
  int ObjectId[];
};

If this would be syntactically okay, but it unfortunately isn’t.

Also, in which case one needs to define the variable as an array to not get undefined behavior. Furthermore, making all variables as buffers as structs like in example one also automatically removes the transparency of using a variable to set a value in. Why you may ask? Well, since shader storage buffers can have many members, but only the last may be an unsized array. So we might expose buffer members as variables (just like with uniform blocks) but exposing a member which is a struct would basically only give us a member which is defined as a block of memory. So setting this block of memory from the engine requires we have a struct on the engine side which we send to the GPU. So we also have to serialize the struct types in order to reflect them on the engine side. Summarized, doing this:

variable->SetMatrix(this->modelTransform);

is much simpler to understand than

struct PerObject
{
  matrix44 Model;
  matrix44 InvModel;
  float4 Diffuse;
  int ObjectId;
} foo;
foo.model = this.modelTransform;
variable->SetStruct(&foo);

Why? Well what if we need to set more than one of the members (for example Model, then InvModel at some later stage)? There is no way to apply this to a generic case! The thing we could do is to simulate a struct as a list of variables, by making:

struct PerObject
{
  mat4 Model;
  mat4 InvModel;
  vec4 Diffuse;
  int ObjectId;
};

layout(std430) buffer PerObjectBuffer
{
   int MaterialType;
   PerObject data[];   
};

Appear like this on the client end:

buffer PerObjectBuffer
{
   int MaterialType;
   mat4 Model[];
   mat4 InvModel[];
   vec4 Diffuse[];
   int ObjectId[];
};

By making the indexed variables here appear as if they were buffer members, we could use use a clever device to offset them when we write to them (using SetVariableIndexed), while still having them exposed as ordinary variables. The only thing we have to consider is that the offset of the variable only increases by the size of the struct for each index, with a constant offset for the other members. Confused? Well let’s make an example. Lets say we are going to set Diffuse for index 0, the offset is then int + mat4 + mat4 since we are going to write to the first index. If we set it for index 1, then the offset is int + (mat4 + mat4 + vec4 + int) * 1. The generic formula is (non-indexable members size) + (indexable member size) * index. In this case, the non-indexable members is simply an int, while the indexable is mat4 + mat4 + vec4 + int.

This being said, OpenGL has a lot of features and is flexible and so on. The only problem is that it’s somewhat complicated to wrap if one wants to wrap all of the advanced features. It should also be mentioned that there is a significant difference between uniforms and buffers. Uniforms are unique, meaning there can be only one uniform per program with the same name. Buffer members are not uniforms, but rather structure members, meaning they are not unique. If we have this code for example:

struct PerObject
{
  mat4 Model;
  mat4 InvModel;
  vec4 Diffuse;
  int ObjectId;
};

layout(std430) buffer PerObjectBuffer
{
   int MaterialType;
   PerObject data[];   
};

layout(std430) buffer SomeOtherBuffer
{
   PerObject data[];   
};

Then PerObject will be used twice, meaning Model will appear twice in the program context. This completely ruins our previous attempts at making storage buffers transparent as variables. Conclusion, we need to change how the engine handles variables. It’s probably simpler to just use uniform buffers and have the members as arrays, since that is already implemented and won’t be weird like this. We can then implement shader storage buffers as simple structures to which we can read/write data from/to when using for example compute shaders.

1 Comments

Leave Comment