With the rewrite of the graphics system, there is an obvious need for a way to easily and consistently implement allocators. So what do we need for a DOD design?
Iteration 1 – Macros
Primarily, we need some class which is capable of having an N number of members. This is in itself non-intuitive, because an N-member template class could not possibly generate variable names for each member. The other way would be to implement a series of macros which allows us to construct the class, but here’s the issue. While creating the class itself is easy to do with a macro, something like __BeginClass __AddMember __EndClass, there also has to be an allocator function that use Ids to recycle slices into those arrays. So, we can do Begin/Add/End for the class declaration, but then we also need a Begin/Add/End pattern for the allocation function. Ugly:
__BeginStorage(); __AddStorage(VkBuffer, buffers); __AddStorage(VkDeviceMemory, mems); __AddStorage(Resources::ResourceId, layouts); __AddStorage(Base::GpuResourceBase::Usage, usages); __AddStorage(Base::GpuResourceBase::Access, access); __AddStorage(Base::GpuResourceBase::Syncing, syncing); __AddStorage(int, numVertices); __AddStorage(int, vertexSize); __AddStorage(int, mapcount); __EndStorage(); __BeginAllocator(); __AddAllocator(buffers, nullptr); __AddAllocator(mems, nullptr); __AddAllocator(layouts, Ids::InvalidId24); __AddAllocator(usages, Base::GpuResourceBase::UsageImmutable); __AddAllocator(access, Base::GpuResourceBase::AccessRead); __AddAllocator(syncing, Base::GpuResourceBase::SyncingCoherent); __AddAllocator(numVertices, 0); __AddAllocator(vertexSize, 0); __AddAllocator(mapcount, 0); __EndAllocator();
Good side is that we can declare default values for each slice. Still, the fact we have to write the same thing twice is not pretty, and the macros underlying it are not pretty either. It’s very easy to make a mistake, and even Visual Studio is really bad at helping with debugging macros. Another problem is that if we need a complex type, with commas in it, the macro will think the next thing is a new argument, so:
__AddStorage(std::map<int, float>, mapping);
Is going to assume the first argument is “std::map
Iteration 2 – Generic programming method
While I am opposed to boost-like (or stl style) generic programming, where simple things like strings are template types because it’s cool, this problem really has no better way of solving. The behavior is simple, one id-pool, N arrays of data, one allocation function which allocates a new slice for all N arrays, some function which, using an id from the pool, can retrieve and deallocate data from all arrays simultaneously.
/// we need a thread-safe allocator since it will be used by both the memory and stream pool typedef Ids::IdAllocatorSafe< RuntimeInfo, // 0 runtime info (for binding) LoadInfo, // 1 loading info (mostly used during the load/unload phase) MappingInfo, // 2 used when image is mapped to memory > VkTextureAllocator;
RuntimeInfo, LoadInfo and MappingInfo are structs which denote components of a texture:
struct LoadInfo { VkImage img; VkDeviceMemory mem; TextureBase::Dimensions dims; uint32_t mips; CoreGraphics::PixelFormat::Code format; Base::GpuResourceBase::Usage usage; Base::GpuResourceBase::Access access; Base::GpuResourceBase::Syncing syncing; }; struct RuntimeInfo { VkImageView view; TextureBase::Type type; uint32_t bind; }; struct MappingInfo { VkBuffer buf; VkDeviceMemory mem; VkImageCopy region; uint32_t mapCount; };
Problem with this solution is that the variables are not named, but are just numbered, so a Get requires a template integer argument for which member. However, it’s implemented such that Get can resolve its return type for us, which is nice.
/// during the load-phase, we can safetly get the structs this->EnterGet(); VkTexture::RuntimeInfo& runtimeInfo = this->Get<0>(res); VkTexture::LoadInfo& loadInfo = this->Get<1>(res); this->LeaveGet();
For textures, we are using a thread-safe method, since textures can be either files loaded in a thread, or memory-loaded directly from memory. Thus, it requires either the Enter/Leave get pattern, or GetSafe. We can also use GetUnsafe, but it’s greatly discouraged because of the obvious syncing issue. Anyway, we can see in the above code that Get takes the number of the member in the allocator, and automatically resolve the return type. For the technical part, the way this is solved is by a long line of generic programming types, unfolding the template arguments and generating an Array Append for each type.
template <typename C> struct get_template_type; /// get inner type of two types template <template <typename > class C, typename T> struct get_template_type<C<T>> { using type = T; }; /// get inner type of a constant ref outer type template <template <typename > class C, typename T> struct get_template_type<const C<T>&> { using type = T; }; /// helper typedef so that the above expression can be used like decltype template <typename C> using get_template_type_t = typename get_template_type<C>::type; /// unpacks allocations for each member in a tuble template<class...Ts, std::size_t...Is> void alloc_for_each_in_tuple(const std::tuple<Ts...> & tuple, std::index_sequence<Is...>) { using expander = int[]; (void)expander { 0, ((void)(const_cast<Ts&>(std::get<Is>(tuple)).Append(get_template_type<Ts>::type())), 0)... }; } /// entry point for above expansion function template<class...Ts> void alloc_for_each_in_tuple(const std::tuple<Ts...> & tuple) { alloc_for_each_in_tuple(tuple, std::make_index_sequence<sizeof...(Ts)>()); } /// get type of contained element in Util::Array stored in std::tuple template <int MEMBER, class ... TYPES> using tuple_array_t = get_template_type_t<std::tuple_element_t<MEMBER, std::tuple<Util::Array<TYPES>...>>>;
The internet helped me greatly. The allocator can be created as such:
template <class ... TYPES> class IdAllocator { public: /// constructor IdAllocator(uint32_t maxid = 0xFFFFFFFF, uint32_t grow = 512) : pool(maxid, grow), size(0) {}; /// destructor ~IdAllocator() {}; /// allocate a new resource, and generate new entries if required Ids::Id32 AllocResource() { Ids::Id32 id = this->pool.Alloc(); if (id >= this->size) { alloc_for_each_in_tuple(this->objects); this->size++; } return id; } /// recycle id void DeallocResource(const Ids::Id32 id) { this->pool.Dealloc(id); } /// get single item from id, template expansion might hurt template <int MEMBER> inline tuple_array_t<MEMBER, TYPES...>& Get(const Ids::Id32 index) { return std::get<MEMBER>(this->objects)[index]; } private: Ids::IdPool pool; uint32_t size; std::tuple<Util::Array<TYPES>...> objects; };
The only real magic here is the fact that we use std::tuple to store the data, tuple_array_t to find out the type of a tuple member, and alloc_for_each_in_tuple to allocate a slice for each array. It’s all compile time, and all generic, but not generic enough as to be too hard to understand. Cheerio!
Now, the coolest thing by far is that it’s possible to chain these allocators, which makes it easy to adapt class hierarchies!
/// this member allocates shaders Ids::IdAllocator< AnyFX::ShaderEffect*, //0 effect SetupInfo, //1 setup immutable values RuntimeInfo, //2 runtime values VkShaderProgram::ProgramAllocator, //3 variations VkShaderState::ShaderStateAllocator //4 the shader states, sorted by shader > shaderAlloc; __ImplementResourceAllocator(shaderAlloc);
Here, VkShaderProgram::ProgramAllocator allocates all individual shader combinations, and VkShaderState::ShaderStateAllocator contains all the texture and uniform binds. They can obviously also have their own allocators, and so on, and so forth! And since they are now also aligned as a single array under a single item of the parent type, which in this case is the shader allocator, they also appear linearly in memory. So, when we bind a shader, and then swap its states, all of the states for that shader will be in line, which is great for cache consistency!