General – Id allocator

With the rewrite of the graphics system, there is an obvious need for a way to easily and consistently implement allocators. So what do we need for a DOD design?

Iteration 1 – Macros

Primarily, we need some class which is capable of having an N number of members. This is in itself non-intuitive, because an N-member template class could not possibly generate variable names for each member. The other way would be to implement a series of macros which allows us to construct the class, but here’s the issue. While creating the class itself is easy to do with a macro, something like __BeginClass __AddMember __EndClass, there also has to be an allocator function that use Ids to recycle slices into those arrays. So, we can do Begin/Add/End for the class declaration, but then we also need a Begin/Add/End pattern for the allocation function. Ugly:

	__BeginStorage();
	__AddStorage(VkBuffer, buffers);
	__AddStorage(VkDeviceMemory, mems);
	__AddStorage(Resources::ResourceId, layouts);
	__AddStorage(Base::GpuResourceBase::Usage, usages);
	__AddStorage(Base::GpuResourceBase::Access, access);
	__AddStorage(Base::GpuResourceBase::Syncing, syncing);
	__AddStorage(int, numVertices);
	__AddStorage(int, vertexSize);
	__AddStorage(int, mapcount);
	__EndStorage();

	__BeginAllocator();
	__AddAllocator(buffers, nullptr);
	__AddAllocator(mems, nullptr);
	__AddAllocator(layouts, Ids::InvalidId24);
	__AddAllocator(usages, Base::GpuResourceBase::UsageImmutable);
	__AddAllocator(access, Base::GpuResourceBase::AccessRead);
	__AddAllocator(syncing, Base::GpuResourceBase::SyncingCoherent);
	__AddAllocator(numVertices, 0);
	__AddAllocator(vertexSize, 0);
	__AddAllocator(mapcount, 0);
	__EndAllocator();

Good side is that we can declare default values for each slice. Still, the fact we have to write the same thing twice is not pretty, and the macros underlying it are not pretty either. It’s very easy to make a mistake, and even Visual Studio is really bad at helping with debugging macros. Another problem is that if we need a complex type, with commas in it, the macro will think the next thing is a new argument, so:

__AddStorage(std::map<int, float>, mapping);

Is going to assume the first argument is “std::map“, and so on. So to circumvent it we first need to typedef the map. Annoying, ugly, and ultimately a work-around. This was iteration 1.

Iteration 2 – Generic programming method

While I am opposed to boost-like (or stl style) generic programming, where simple things like strings are template types because it’s cool, this problem really has no better way of solving. The behavior is simple, one id-pool, N arrays of data, one allocation function which allocates a new slice for all N arrays, some function which, using an id from the pool, can retrieve and deallocate data from all arrays simultaneously.

	/// we need a thread-safe allocator since it will be used by both the memory and stream pool
	typedef Ids::IdAllocatorSafe<
		RuntimeInfo,						// 0 runtime info (for binding)
		LoadInfo,							// 1 loading info (mostly used during the load/unload phase)
		MappingInfo,						// 2 used when image is mapped to memory
	> VkTextureAllocator;

RuntimeInfo, LoadInfo and MappingInfo are structs which denote components of a texture:

	struct LoadInfo
	{
		VkImage img;
		VkDeviceMemory mem;
		TextureBase::Dimensions dims;
		uint32_t mips;
		CoreGraphics::PixelFormat::Code format;
		Base::GpuResourceBase::Usage usage;
		Base::GpuResourceBase::Access access;
		Base::GpuResourceBase::Syncing syncing;
	};
	struct RuntimeInfo
	{
		VkImageView view;
		TextureBase::Type type;
		uint32_t bind;
	};
	struct MappingInfo
	{
		VkBuffer buf;
		VkDeviceMemory mem;
		VkImageCopy region;
		uint32_t mapCount;
	};

Problem with this solution is that the variables are not named, but are just numbered, so a Get requires a template integer argument for which member. However, it’s implemented such that Get can resolve its return type for us, which is nice.

	/// during the load-phase, we can safetly get the structs
	this->EnterGet();
	VkTexture::RuntimeInfo& runtimeInfo = this->Get<0>(res);
	VkTexture::LoadInfo& loadInfo = this->Get<1>(res);
	this->LeaveGet();

For textures, we are using a thread-safe method, since textures can be either files loaded in a thread, or memory-loaded directly from memory. Thus, it requires either the Enter/Leave get pattern, or GetSafe. We can also use GetUnsafe, but it’s greatly discouraged because of the obvious syncing issue. Anyway, we can see in the above code that Get takes the number of the member in the allocator, and automatically resolve the return type. For the technical part, the way this is solved is by a long line of generic programming types, unfolding the template arguments and generating an Array Append for each type.

template <typename C>
struct get_template_type;

/// get inner type of two types
template <template <typename > class C, typename T>
struct get_template_type<C<T>>
{
	using type = T;
};

/// get inner type of a constant ref outer type
template <template <typename > class C, typename T>
struct get_template_type<const C<T>&>
{
	using type = T;
};

/// helper typedef so that the above expression can be used like decltype
template <typename C>
using get_template_type_t = typename get_template_type<C>::type;

/// unpacks allocations for each member in a tuble
template<class...Ts, std::size_t...Is>
void alloc_for_each_in_tuple(const std::tuple<Ts...> & tuple, std::index_sequence<Is...>)
{
	using expander = int[];
	(void)expander
	{
		0, 
		((void)(const_cast<Ts&>(std::get<Is>(tuple)).Append(get_template_type<Ts>::type())), 
		0)...
	};
}

/// entry point for above expansion function
template<class...Ts>
void alloc_for_each_in_tuple(const std::tuple<Ts...> & tuple)
{
	alloc_for_each_in_tuple(tuple, std::make_index_sequence<sizeof...(Ts)>());
}

/// get type of contained element in Util::Array stored in std::tuple
template <int MEMBER, class ... TYPES>
using tuple_array_t = get_template_type_t<std::tuple_element_t<MEMBER, std::tuple<Util::Array<TYPES>...>>>;

The internet helped me greatly. The allocator can be created as such:

template <class ... TYPES>
class IdAllocator
{
public:
	/// constructor
	IdAllocator(uint32_t maxid = 0xFFFFFFFF, uint32_t grow = 512) : pool(maxid, grow), size(0) {};
	/// destructor
	~IdAllocator() {};

	/// allocate a new resource, and generate new entries if required
	Ids::Id32 AllocResource()
	{
		Ids::Id32 id = this->pool.Alloc();
		if (id >= this->size)
		{
			alloc_for_each_in_tuple(this->objects);
			this->size++;
		}
		return id;
	}

	/// recycle id
	void DeallocResource(const Ids::Id32 id) { this->pool.Dealloc(id); }

	/// get single item from id, template expansion might hurt
	template <int MEMBER>
	inline tuple_array_t<MEMBER, TYPES...>&
	Get(const Ids::Id32 index)
	{
		return std::get<MEMBER>(this->objects)[index];
	}
private:

	Ids::IdPool pool;
	uint32_t size;
	std::tuple<Util::Array<TYPES>...> objects;
};

The only real magic here is the fact that we use std::tuple to store the data, tuple_array_t to find out the type of a tuple member, and alloc_for_each_in_tuple to allocate a slice for each array. It’s all compile time, and all generic, but not generic enough as to be too hard to understand. Cheerio!

Now, the coolest thing by far is that it’s possible to chain these allocators, which makes it easy to adapt class hierarchies!

	/// this member allocates shaders
	Ids::IdAllocator<
		AnyFX::ShaderEffect*,						//0 effect
		SetupInfo,									//1 setup immutable values
		RuntimeInfo,								//2 runtime values
		VkShaderProgram::ProgramAllocator,			//3 variations
		VkShaderState::ShaderStateAllocator			//4 the shader states, sorted by shader
	> shaderAlloc;
	__ImplementResourceAllocator(shaderAlloc);	

Here, VkShaderProgram::ProgramAllocator allocates all individual shader combinations, and VkShaderState::ShaderStateAllocator contains all the texture and uniform binds. They can obviously also have their own allocators, and so on, and so forth! And since they are now also aligned as a single array under a single item of the parent type, which in this case is the shader allocator, they also appear linearly in memory. So, when we bind a shader, and then swap its states, all of the states for that shader will be in line, which is great for cache consistency!

Graphics – New design philosophy

DOD vs OOP

Data oriented design has been the new thing ever since it was rediscovered, and for good reason. The funny part is that in practice it is a regression in technology, back to the good old days of C, although the motivations may be different. So here is the main difference between OOP and DOD:

With OOP, an object is a singular instance of its data and methods. As OOP classes get more members, its size increases, and with it the stride between consecutive elements. In addition, an OOP solution has a tendency to allocate an instance of an object when it is required. OOP is somewhat intuitive to many modern programmers, because it attempts to explain the code in clear-text.

The DOD way is very different. It’s still okay to have classes and members, although care should be taken as to how those members are used. For example, if some code only requires members A and B, then it’s bad for the cache if there are members C and D between each element. So how do we still use an object-like mentality? Say we have a class A, and should use members a, b, and c. Instead of treating A as individual objects, we have a new class AHub, which is the manager of all the A instances. The AHub contains a, b and c as individual arrays. So how do we identify individual A objects? Well, they become an index into those arrays, and since those arrays are uniform in length, each index becomes a slice. Now it’s fine if for example a is another class or struct. There are many benefits to a design of this nature:

1. Objects become integers. This is nice because there is no need to include anything to handle an integer, and the implementation can easily be obfuscated in a shared library.
2. No need to keep track of pointers and their usage. When an ID is released nothing is really deleted, instead the ID is just recycled. However there are ways check if an ID is valid.
3. Ownership of objects is super-clear. The hub classes will indiscriminately be the ONLY class responsible for creating and releasing IDs.

An example of how different the code can be, here is an example:

// OOP method
Texture* tex = new Texture();
tex->SetWidth(...);
tex->SetHeight(...);
tex->SetPixelFormat(...);
tex->Setup();
tex->Bind();

// DOD method
Id tex = Graphics::CreateTexture(width, height, format);
Graphics::BindTexture(tex);

Now one of the things which may be slightly less comfortable is where you put all those functions?! Since all you are playing with are IDs, there are no member functions to call. So where is the function interface? Well, the hub of course! And since the hubs are interfaces to your objects, the hubs themselves are also responsible for the operations. And since the hubs are interfaces, they can just as well be singleton instances. And because you can make the hubs singletons, it also means you can have functions declared in the class namespace which belong to no class at all. So in the above example, we have the row:

Id tex = Graphics::CreateTexture(width, height, format);

, but since textures are managed in some singleton, what Graphics::CreateTexture is really doing is this:

namespace Graphics
{
Id CreateTexture(int width, int height, PixelFormat format)
{
    return TextureHub::CreateTexture(width, height, format);
}
} // namespace Graphics

Now the benefits is that all functions can go into the same namespace, Graphics, in this case, and the programmer does not need to keep track of whatever the the hub is called.

Resources

In Nebula, textures are treated as resources, and they go through a different system to be created, however the process is exactly the same. Textures are managed by a ResourcePool, like all resources, and the ResourcePools are also responsible for implementing the behavior of those resources. With this new system, smart pointers are not really needed that much, but one of the few cases where they are still in play is for those pools. The resources have a main hub, called the ResourceManager, and it contains a list of pools (which are also responsible for loading and saving). There are two families of pools, stream pool sand memory pools. Stream pools can act asynchronously, and fetches its data from some URI, for example a file. Memory pools are always immediate, and take their information from data already in memory.

Textures for example, can be either a file, like a .dds asset, or it can be a buffer mapped and loaded by some other system, like LibRocket. Memory pools have a specific set of functions to create a resource, and they are ReserveResource which creates a new empty resource and returns the Id, and UpdateResource which takes a pointer to some update structure which is then used to update the data.

The way a resource is created is through a call to the ResourceManager, which gets formatted like so:

Resources::ResourceId id = ResourceManager::Instance()->ReserveResource(reuse_name, tag, MemoryVertexBufferPool::RTTI);
struct VboUpdateInfo info = {...};
ResourceManager::Instance()->UpdateResource(id, &info);

reuse_name is a global resource Id which ensures that consecutive calls to ReserveResource will return the same Id. tag is a global tag, which will delete all resources under the same tag if DiscardByTag is called on that pool. The last argument is the type of pool which is supposed to reserve this resource. In order to make this easier for the programmer, we can create a function within the CoreGraphics namespace as such:

namespace CoreGraphics
{
Resources::ResourceId 
CreateVertexBuffer(reuse_name, tag, numVerts, vertexComponents, dataPtr, dataPtrSize)
{
    Resources::ResourceId id = ResourceManager::Instance()->ReserveResource(reuse_name, tag, MemoryVertexBufferPool::RTTI);
    struct VboUpdateInfo info = {...};
    ResourceManager::Instance()->UpdateResource(id, &info);
}
} // namespace CoreGraphics

ReserveResource has to go through and find the MemoryVertexBufferPool first, so we can eliminate that too by just saving the pointer to the MemoryVertexBufferPool somewhere, perhaps in the same header. This is completely safe since the list of pools must initialized first, so their indices are already fixed.

Now all we have to do to get our functions is to include the CoreGraphics header, and we are all set! No need to know about nasty class names, everything is just in there, like a nice simple facade. Extending it is super easy, just declare the same namespace in some other file, and add new functions! Since we are always dealing with singletons and static hubs, none of this should be too complicated. It’s back to functions again! Now we can chose to have those functions declared in some header for each use, for example all texture-related functions could be in the texture.h header, or they could be exposed in a single include. Haven’t decided yet.

One of the big benefits is that while it’s quite complicated to expose a class to for example a scripting interface, exposing a header with simple functions is very simple. And since everything is handles, the user never has to know about the implementation, and is only exposed to the functions which they might want.

Handles

So I mentioned everything is returned as a handle, but a handle can contain much more information than just an integer. The resources is one such example, it contains the following:

1. First (leftmost) 32 bits is the unique id of the resource instance within the loader.
2. Next 24 bits is the resource id as specified by reuse_name for memory pools, or the path to the file for stream pools.
3. Last 8 bits is the id of the pool itself within the ResourceManager. This allows an Id to be immediately recognized as belonging to a certain pool, and the pool can be retrieved directly if required.

This system will keep an intrinsic track of the usage count, since the amount of times a resource is used is indicated by the unique resource instance number, and once all of those ids are returned, the resource itself is safe to discard.

All the graphics side objects are also handles. If we for example want to bind a vertex buffer to slot 0 with offset 0, we do this:

CoreGraphics::BindVertexBuffer(id, 0, 0);

Super simple, and that function will fetch the required information from id, and send it to the render device. While this all looks good here, there is still tons of work left to do in order to convert everything.