Texture compression

My focus this past week has been to improve the usability of Nody. The most obvious thing I came up with was the ability to preview textures in the actual node window, so as to allow the user to see how different texture will look with the given shader. There was two major problems with this. First, is it wiser to use the work folder (containing a very broad mixture of texture formats) or use the export folder? Second, how do I deal with the formats I’m faced with?

Turns out Qt has this covered, for every single file format than TGA and PSD, which are the currently used work-formats in Nebula. I started off with basically copy-pasting a TGA loader done by the Qt crew. This version is not available in the standard Qt package, but requires Qt3D, so I thought I’d rather not use more modules but instead just copy the code. Then I tried figuring out how PSD works, and while it’s pretty straightforward, it’s still very hard to figure out exactly how the bytes are laid out.

In the PSD and PSB specification, it says the PSD file consists of 4 sections, header, color modes, layer and mask information, and image data. I want the header (for sizes, bit depths and such) and the raw image data. Also, PSD files are RLE-compressed, a loss-less compression method which results in a very small size if there is little to no variation in the source. In case you are to lazy to google, it works by having each scan-line (row in unfancy terms) be split into segments, where first you have one byte describing how many of the same pixel will appear in row, followed by the data for those pixels, depending on pixel depth. So for examle, I could have 15 pixels in row which are pure red, and with a R8 G8 B8 image that would give me something like: 15 255 0 0.

PSD supports up to 56 channels, so each pixel could potentially be 56 bytes for the lowest byte per pixel, which is a lot of information. The specification also stated that each row begins with the total byte count for that row, followed by the RLE-compressed data. What’s strange about that is the need to state how many bytes you have, because you already know from each RLE-package how much you are going to receive. If I know I have 3 channels and 8-bit coloring, I can also be sure how many bytes will follow each pixel repeat counter.

Halfway through this though, I got some advice that I maybe shouldn’t focus on handling every single work format there was, but instead reading the exported format, DDS.

DDS means DirectDraw Surface, and is nothing more than a container for an actual texture. The DDS header, specified here: http://msdn.microsoft.com/en-us/library/windows/desktop/bb943982(v=vs.85).aspx has all the information necessary to reach the actual image data. This is the part where stuff gets trixy. DDS can contain either raw textures, which are easily read by simply reading them byte for byte. DDS can also contain compressed formats, such as DXT1, DXT3 or DXT5. If one likes to read specifications on how these compression algorithms work, one might want to visit: http://msdn.microsoft.com/en-us/library/windows/desktop/bb694531(v=vs.85).aspx#. The compression algorithms described are called block compressions. A block consists of 16 texels, which are somehow averaged into a data structure, which takes less room than the original raw data, but is still without minimum loss. If one is too lazy to read, I can give you a fast introduction how to decode these files.

First we have DXT1. DXT1 saves two colors, both of which are the two extreme colors of a block. Then, an integer is saved, which contains all the codes for all the 16 texels. Each texel only needs two bits (remember BITS) to describe what color they should use. The colors, let’s call them colorOne and colorTwo, which are read from the image, can create two averages, colorThree and colorFour. Together, they can be indexed by using the decimal values 0, 1, 2, 3, or binary values 00, 01, 10, 11. Remember how every texel has two bits? Well, these two bits are just what you need to count to 3. So each texel has its own two indices masked out, and then matched to a list of colors, so the appropriate texel can retrieve the appropriate color. I solved this by using an array, which saves the colors in order, so addressing the array with index 00 would give me colorOne, and 01 colorTwo etc. Getting the indices from the actual int simply required a bit shift to the appropriate bit, and then an & comparison with 11 to get their value. I almost forgot that if colorOne is bigger than colorTwo, one has to set colorFour to 0, and make colorThree a linear interpolation of colorOne and colorTwo, instead of a bilinear. This is to accommodate for alpha, and as you can see, DXT1 only supports binary alpha.

Then we have DXT3. DXT3 has the exact same color layout as DXT1. What differs is that DXT3 can handle non-binary alpha. The structure starts with a set of 8 bytes containing the alpha values of all indices. Once that is read, the rest is cake. When we have the color index for our texel, all we need to do is to fetch our two alpha-bits at that texel, and multiply the second bit with 256 to create a short. Slam dunk done.

DXT5 has an even more complex way of handling alpha. Instead of just simply having an alpha value per index, DXT5 stores alpha as a palette, much like the colors, and then simply use interpolation to find the wanted alpha value. So the structure starts with two alpha bytes, our extreme values. Then there are 6 alpha bytes. Why 6 you may ask? Well, remember how 4 bits could index our colors in a very nice way? Alpha needs 3 bits (as I said before, BITS again) per texel to be used as an index. So expand 4 bytes with half  and you get 6 bytes. The fancy thing about DXT5, is that alpha has to be calculated just like we calculated colors, but now we have more data for alpha values than we have colors. So if we have a finer grain of alpha values, that means our indices must be bigger, so instead of just having 00, 01, 10 and 11, we have 000, 001, 010, 011, 100, 101, 110, 111. These values are then used to sample the alpha. The tricky part with this though, is that there is no structure which holds 6 bytes, so we have to divide the data into one int and one short. The real problem comes when we need to get our indices, even though the indices actually border from the int to the short. The easiest way to do this is to simply count, we start at the short, that one ends after 16 bits, which would give us that when we are at the 15th bit, we need to take two bits from the int, this is our transition zone. This sample code explains it in detail.

 

int alphaCodeIndex = 3*(4*i+j);

int alphaCode;

if (alphaCodeIndex <= 12)
{
alphaCode = (alphaCode2 >> alphaCodeIndex) & 0x07;
}
else if (alphaCodeIndex == 15)
{
alphaCode = (alphaCode2 >> 15) | ((alphaCode1 << 1) & 0x06);
}
else // alphaCodeIndex >= 18 && alphaCodeIndex <= 45
{
alphaCode = (alphaCode1 >> (alphaCodeIndex – 16)) & 0x07;
}

 

alphaCodeIndex is a running iterator, which for every row i increases by j so that we always have a three bit increment. When we go past the size of the short, we need to remove the size of the short from the index, seeing as we need to ‘start over’ when we are going to bit shift the int. I can’t take any credit for this clever solution, seeing as I found it at: http://www.glassechidna.com.au/2009/devblogs/s3tc-dxt1dxt5-texture-decompression/.

When alphaCode is retrieved, it is used to get an indexed alpha value, and the compression is done. This is the result in Nody:

 

The image shows us one bump map (to the left) compressed with DXT5 and one diffuse map (to the right) compressed with DXT1 being rendered in real-time in Nody. The next step is to actually see the texture being applied whenever this happens. I couldn’t be more excited!