Vulkan – Persistent descriptor sets

Vulkan allows us to bind shader resources like textures, images, storage buffers, uniform buffers and texel buffers in an incremental manner. For example, we can bind all view matrices in a single descriptor set (actually, just a single uniform buffer) and have it persist between several pipeline switches. However, it’s not super clear how descriptor sets are deemed compatible between pipelines.

NOTE: When mentioning shader later, I mean AnyFX style shaders, meaning a single shader can contain several vertex/pixel/hull/domain/geometry/compute shader modules.

I could never get the descriptor sets to work perfectly, which is to bind the frame-persistent descriptors first each frame, and then not bind them again for the entire frame (or view). Currently, I bind my ‘shared’ descriptor sets after I start a render pass or bind a compute shader.

When binding a descriptor set, all descriptor sets currently bound with a set number lower than the one you are binding now has to be compatible. So if we have set 0 bound, and bind set 3, then for set 0 to stay bound, it has to be compatible with the pipeline. If we switch pipelines, then the descriptor sets compatible between pipelines will be retained, if they follow the previous rule. That is, if Pipeline A has sets 0, 1, 2, 3 and Pipeline B is bound, and sets 0 and 1 are compatible, then 2 and 3 will be unbound and will need to be bound again.

Where do we find the biggest change of shader variables? Well, clearly in each individual shader. For example, let’s pick shader billboard.fx, which has a vec4 Color, and a sampler2D AlbedoMap. In AnyFX, the Color variable would be a uniform and tucked away in a uniform buffer, and the AlbedoMap would be its own resource. In the Vulkan implementation, they would also be assigned a set number, and to avoid screwing with lower sets, thereby trying to avoid invalidating descriptor sets, this ‘default set’ would have to be high enough for other sets to not go above it. However, since we can’t really know the shader developers intention of how sets are used, the compiler be supplied a flag, /DEFAULTSET , which will determine where all default sets will go. This means that the engine and the shader developer themselves can decide where the most likely to be incompatible descriptor set should go.

I also got texture arrays and indexing to work properly, so now all textures are submitted as a huge array of descriptors, and whenever an object is rendered all that is updated is the index into the array which is supplied in a uniform buffer. This way, we can greatly keep the amount of descriptor sets down to a minimum of 1 per set number per shader resource. Allocating a new resource using a certain shader will expand the uniform buffer to accommodate for object-specific data.

First off is the naïve way:

Memory Memory Memory Memory Memory Memory Memory Memory
Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer
Object 1 Object 2 Object 3 Object 4 Object 5 Object 6 Object 7 Object 8

Which was where I was a couple of days ago, and this forced me to use one descriptor per shader state, since each shader state has their own buffer. The slightly less bad way of doing this is:

Memory
Buffer Buffer Buffer Buffer Buffer Buffer Buffer Buffer
Object 1 Object 2 Object 3 Object 4 Object 5 Object 6 Object 7 Object 8

Which reduces memory allocations but also doesn’t help with keeping the descriptor set count low.

Memory
Buffer
Object 1 Free Free Free Free Free Free Free

Allocating a new object just returns a free slot.

Memory
Buffer
Object 1 Object 2 Free Free Free Free Free Free

If the memory backing is full, we expand the buffer size and allocate new memory.

Memory
Buffer
Object 1 Object 2 Object 3 Object 4 Object 5 Object 6 Object 7 Object 8
Memory
Buffer
Object 1 Object 2 Object 3 Object 4 Object 5 Object 6 Object 7 Object 8 Object 9 Free Free Free Free Free Free Free

As you can see, the buffer stays the same, meaning we can keep it bound in the descriptor set, and just change its memory backing. The only thing the shader state needs to do now is to submit the exact same descriptor state as all sibling states, but provide its own offset into the buffer.

However, since we need to create a new buffer in Vulkan to bind new memory, we actually have to update the descriptor set when we expand, but this will only be done when creating a shader state, which is done outside of the rendering loop anyways.

Textures are bound by the shader server each time a texture is created, it registers with the shader server, and the shader server performs a descriptor set write. The texture descriptor set must be set index 0, so that it can be shared by all shaders.

Consider this shader:

group(1) varblock MaterialVariables
{
   ...
};
group(1) sampler2D MaterialSampler;

group(2) r32f image2D ReadImage;
group(2) image2D WriteImage;

group(3) varblock KernelVariables
{
   ...
};

Resulting in this layout on the engine side.

Shader
Descriptor set 1 Descriptor set 2 Descriptor set 3
Uniform buffer Sampler Image Image Uniform buffer

Creating a ‘state’ of this shader would only perform an expansion of the uniform buffers in sets 1 and 3, but the sampler and two images will be directly bound to the descriptor set of the shader, meaning that any per-object texture switches would cause all objects to switch textures. We don’t want that, obviously, but we’re almost there. We can still create a state of this shader and not bind our own uniform buffers, by simply expanding the uniform buffers in sets 1 and 3 to accommodate for the per-object variables. To do this for textures, we need to apply the texture array method mentioned before.

group(0) sampler2D AllMyTextures[2048];
group(1) varblock MaterialVariables
{
   uint MaterialTextureId;
   ...
};

group(2) r32f image2D ReadImage;
group(2) image2D WriteImage;

group(3) varblock KernelVariables
{
   ...
};

Which results in the following layout:

Shader
Descriptor set 0 Descriptor set 1 Descriptor set 2 Descriptor set 3
Sampler array Uniform buffer Image Image Uniform buffer

Now, texture selection is just a manner of uniform values, supplying a per-object value for the uniform buffer value MaterialTextureId. While this is trivial for samplers, it also leaves us asking for more. For example, how do we perform different sampling of textures when all samplers are bound in an array? Vulkan allows for a texture to be bound with an immutable sampler in the descriptor set, so that’s one option, although we supply all our sampler information in AnyFX in the shader code by doing something like:

samplerstate MaterialSamplerState
{
   Samplers = { MaterialSampler };
   Filter = Anisotropic;
};

But we can’t anymore, because we don’t have MaterialSampler, and applying this sampler state to all textures in the entire engine might not be correct either. Luckily for us, the KHR_vulkan_glsl extension supplies us with the ability to decouple textures from samplers, and create the sampler in shader code. So I enabled AnyFX to create such a separate sampler object, although to do so one must omit the list of samplers. So the above code would be:

group(1) samplerstate MaterialSamplerState
{
   Filter = Anisotropic;
};

Which results in a separate sampler, and the descriptor sets would be:

group(0) texture2D AllMyTextures[2048];
...

And finally, sampling the texture is

vec4 Color = texture(sampler2D(AllMyTextures[MaterialTextureId], MaterialSamplerState));

Instead of

vec4 Color = texture(AllMyTextures[MaterialTextureId]);

Which will allow us to, in the shader code, explicitly select which sampler state to use, even if we have all our textures submitted once per frame. I could also implement a list of image-samplers combined really easily, and allow for example a graphics artist to supply the texture with sampler information, and just have that updated directly into the descriptor set, but still be able to fetch the proper sampler from the array.

For the sake of completeness, here’s the final shader layout:

Shader
Descriptor set 0 Descriptor set 1 Descriptor set 2 Descriptor set 3
Texture array Sampler state Uniform buffer Image Image Uniform buffer

So this proves we can utilize uniform buffers to select textures too, covering all our grounds in one tied up bow. Neat. Except for images, and here’s why.

Images are not switched around and messed around with like textures are, and for good reason. An image is used when a shader needs to perform a read-write to texels in the same resource, meaning that images are mostly used for random access and random writes, for post effects and the like, and are thus not as prone to changes as for example individual objects. Instead, images are mostly consistent, and can be bound during rendering engine setup. We could implement image arrays like we do texture arrays, however we must consider the HUGE amount of format combinations required to fit all cases.

Images can, like textures, be 2D, 2D multisample, 3D, Cube, just to mention the common types. We obviously have special cases like 2DArray, CubeArray and so forth, but array textures are not even used or supported in Nebula; never saw the need for them. However, images also needs a format qualifier if the image is to be supported with imageLoad, meaning we basically need a uniform array of all 4 ordinary types, with all permutations of formats. While possible, I deemed it a big no-no, and instead determined that since images are special use resources for single-fire read-writes, then a shader has to update the descriptor set each time it wants to change it, meaning it’s more efficient to, in the same shader, reuse the same variable and just not perform a new binding. All in all, this shouldn’t become a problem.

What’s left to do is to enforce certain descriptor set layouts by the shader loader, so that no shader creator accidentally use a reserved descriptor set (like 0 for textures, 1 for camera, 2 for lighting, 3 for instancing). If the shader does, it will manipulate a reserved descriptor set which will cause it to become incompatible, and we can’t have that since it will simply cause manually applied descriptors to stop being bound, resulting in unpredictable behavior. Another way of solving this issue is by changing the group-syntax in AnyFX to something more stable and easier to validate, like making it into a structure like syntax, for example:

group 0
{
   sampler2D Texture;
   varblock Block
   {
     vec4 Vector;
     uint Index;
   }
}

And then assert that no group is later declared with the same index. To handle stray variables declared outside of a group, the compiler simply generates the default group, and puts all strays in there.

The only issue I have with the above syntax is the annoying level of indirection before you actually get to the meat of the shader code. I think implementing an engine side check is the way to go now, but implementing groups as a structure like above could be a valid idea, since we might want to have the same behavior in all rendering APIs. Consider this for OpenGL too, in which we can guarantee that applying a group of uniforms and textures will remain persistent if all shaders share the same declaration. Although, in OpenGL, since we don’t have descriptor sets, we must simply ensure that the location values for individual groups remain consistent.

Vulkan – Shading ideas

So this is where the Vulkan renderer is right now.
vulkan5

What you see might be unimpressive, but when getting to this stage there isn’t too much left. As you can see, I can load textures, which are compressed (hopefully you can’t see that), render several objects with different shaders and uniform values, like positions, and textures.

This might seem to be near completion, just a couple of post effects which might need to be redone (mipmap reduction compute shader for example), but you would be wrong.

In this example, the Vulkan renderer created a single descriptor set per object, which I thought was fine and I basically assumed that is what descriptor sets were for. I believed that descriptor sets would be like using variable setups and just apply them as a package, instead of individually selecting textures and uniform buffers. However, on my GPU, which is a AMD Fury Nano, sporting a massive 4 GB of GPU memory (it doesn’t run Chrome, so it’s massive), I ran out of memory when reaching a meager 2000 objects. Out of GPU memory, never actually experienced that before.

So I decided to check how much memory I actually allocated, and while Vulkan supplies you with a nice set of callback functions to look this up, it doesn’t really do much for descriptor pools, and I have already boggled down the memory usage exhaustion to be happening when I create too many objects, so it cannot be a texture issue. Anyhow in order to have per-object unique variables, each object allocates its own uniform buffer backing for the ‘global’ uniform buffer. Buffer memory never exceeds 260~ MB. Problem is not there.

So the only conclusion I can draw is that the AMD driver allocates TONS of memory for the descriptor sets. So I did a bit of studying, and I decided to go with this solution for handling descriptor sets: Vulkan Fast Paths.

The TL;DR of the pdf is to put all textures into huge arrays, so I did:

#define MAX_2D_TEXTURES 4096
#define MAX_2D_MS_TEXTURES 64
#define MAX_CUBE_TEXTURES 128
#define MAX_3D_TEXTURES 128

group(TEXTURE_GROUP) texture2D 		Textures2D[MAX_2D_TEXTURES];
group(TEXTURE_GROUP) texture2DMS 	Textures2DMS[MAX_2D_MS_TEXTURES];
group(TEXTURE_GROUP) textureCube 	TexturesCube[MAX_CUBE_TEXTURES];
group(TEXTURE_GROUP) texture3D 		Textures3D[MAX_3D_TEXTURES];

And textures are fetched through:

group(TEXTURE_GROUP) shared varblock RenderTargetIndices
{
	// base render targets
	uint DepthBufferIdx;
	uint NormalBufferIdx;
	uint AlbedoBufferIdx;	
	uint SpecularBufferIdx;
	uint LightBufferIdx;
	
	// shadow buffers
	uint CSMShadowMapIdx;
	uint SpotLightShadowMapIdx;
};

Well, render targets are. On the ordinary shader level, textures would be fetched by an index which is unique per object. I also took the liberty to implement samplers which are like uniforms, bound in the shader and can be assembled in GLSL as defined in GL_KHR_vulkan_glsl section Combining separate samplers and textures. This allows us to assemble samplers and textures in the shader code, which is good if we have a texture array like above, where we can’t really assign a sampler per texture in the shader, because we have absolutely no clue when writing the shaders which texture goes where, so it’s much more flexible to be able to assign a sampler state when we know what kind of texture we want, let me give you an example.

The old way would be:

samplerstate GeometryTextureSampler
{
	Samplers = { SpecularMap, EmissiveMap, NormalMap, AlbedoMap, DisplacementMap, RoughnessMap, CavityMap };
	Filter = MinMagMipLinear;
	AddressU = Wrap;
	AddressV = Wrap;
};
...
vec4 diffColor = texture(AlbedoMap, UV) * MatAlbedoIntensity;
float roughness = texture(RoughnessMap, UV).r * MatRoughnessIntensity;
vec4 specColor = texture(SpecularMap, UV) * MatSpecularIntensity;
float cavity = texture(CavityMap, UV).r;

The new way is:

samplerstate GeometryTextureSampler 
{
	Filter = MinMagMipLinear;
	AddressU = Wrap;
	AddressV = Wrap;
};
...
vec4 diffColor = texture(sampler2D(AlbedoMap, GeometryTextureSampler), UV) * MatAlbedoIntensity;
float roughness = texture(sampler2D(RoughnessMap, GeometryTextureSampler), UV).r * MatRoughnessIntensity;
vec4 specColor = texture(sampler2D(SpecularMap, GeometryTextureSampler), UV) * MatSpecularIntensity;
float cavity = texture(sampler2D(CavityMap, GeometryTextureSampler), UV).r;

While the new way is only possible in GLSL through the KHR_vulkan extension, this has been the default way in DirectX since version 10. This syntax also allows for a direct mapping of texture sampling between GLSL<->HLSL if we want to use HLSL above shader model 3.0.

This method basically allows for all textures to be bound to a single descriptor set, and this descriptor set can then be applied to bind ALL textures at the same time. So when this texture library is submitted, we basically have access to all textures directly in the shader. Neat huh? It’s like bindless textures, and that is exactly what AMD mentions in the talk.

Then we come to uniform buffers. I read the Vulkan Memory Management and all of the sudden it became completely clear to me. If we want to keep the number of descriptor sets down, we can’t have a individual buffer per object because that requires either a descriptor set per object with the individual buffer bound to it, or it requires us to sync the rendering commands and update the descriptor set being used.

So the solution is to use the same uniform buffer, and expand its size per object. And if you follow the nvidia article, that is clearly not a good way to go. Instead, the uniform buffers implement a clever array allocation method, where we grow the total size by a set amount of instances, and keep a list of used and free indices (which can be calculated to offsets) into the buffer. Allocating when there are no free indices grows the buffer by the maximum of a set amount (8) or the number of instances requested. Allocating when there are free indices returns the offset calculated used said free index, and trying to allocate a range of values first attempts to fit the range in the list of free indices if there are enough free indices, or allocates a new chunk if no such range could be found.

So basically, the Vulkan uniform buffer implementation uses a pool allocator to grow its size (doesn’t shrink it though, which we actually might want to do). But because we are using GPU memory, we might want to avoid doubling the memory, however that is a problem for later. Each allocation returns the offset into the buffer, so that we can bind the descriptor with per-object offsets later, which means we retain the exact same descriptor set, but only modifies the offsets.

So to sum up:

  • Texture arrays with all textures bound at the same time, submitting the entire texture library (or libraries, for all 2D, 2DMS, Cube and 3D textures).
  • Uniform buffers are created per shader (resource-level) and each instance allocates a chunk of memory in this buffer.
  • Offsets into the same buffer is used per object so we can have the same descriptor set but jump around in it, giving us per-object variables.
  • Textures are sent as indexes, and can thus be on a per-object basis too.

The only real issue with this method is read-write textures, also known as images in GLSL. Since image variables has to be declared with a format qualifier denoting how to read from the image, we can’t really bind them as above. However images are not really on a level of update frequency as textures are, instead they are bound and switched on a per-shader basis, like with post effects, and are either statically assigned or can be predicted. For example, doing a blur horizontal + vertical pass requires the same image to be bound between both passes, however if we want to perform a format change, like in the HBAO shader, where we transfer from ao, p -> ao, we can just bind the same image to two different slots, and thus avoid descriptor updates.

Oh, I should also mention that all of this might soon be possible to do in OpenGL too, with the GL SPIRV extension, which should give us the ability in OpenGL to use samplers as separate objects. Texture arrays already exists, and so do uniform buffers.