I’ve been working hard on an OpenGL renderer. The main reason is that we want to be able to move away from DirectX and Windows, for oh so many reasons. However, one of the major problems is the handling of shaders. With DirectX, we can use the quite flexible FX framework, which lets us encapsulate shaders, render states and variables into a very neat package. From a content management perspective, this is extremely flexible since render states can be implemented as a separate subsystem. This is the reason why I’ve been developing the FX framework I’ve been talking about.
Well, it works, and a result we now have a functioning OpenGL renderer. The only downside is that it’s extremely slower than the DirectX renderer. I’m currently investigating if this is driver related, shader related or stupidly implemented in Nebula. However, it’s identical in any other aspect, meaning we’ve crossed one of the biggest thresholds with getting a working version in Linux.
These are the results:
It also just dawned on me that the “Toggle Benchmark with F3!” is not showing on the DirectX 11 version. Gotta look into that…
Anyway, this seriously got me thinking what actually demanded time. I made some improvements in the AnyFX API and reduced the number of GL calls from 63k per frame down to 16k, but it made no difference in FPS. What you see here is approximately 2500 models which are individually drawn (no instancing). The DirectX renderer only updates its constant buffers and renders using the Effects11 framework as backend for shader handling. The OpenGL renderer does the exact same thing, only updates vital uniforms and uniform buffers and renders. As a matter of fact, I used apitrace to watch what was taking so much time. What I observed was that each draw call took about 20 microseconds, which multiplied with 2500 results in 0.05 seconds per frame which amounts to 20 fps. The methods for updating the buffer takes only a fraction of the time, however the GPU waits an ENORMOUS amount of time before even starting the rendering process, as can be seen in this picture.
The blue lines describe the CPU load, the width of the line or section determines the time it takes. We can see that each call costs some CPU time. However, we can also see that we start rendering way earlier than the actual GPU starts to get some work done. We can also very easily observe that GPU isn’t busy with anything, so there is no apparent reason (as far as I can tell) as to why it doesn’t start immediately.
Crazy. We’ll see if the performance is as low in Linux as it is in Windows. Have in mind that this is done on an ATI card. On the Nvidia-card I have access to, I got ~40 FPS, but even so, the performance waved between ~20 FPS and ~40 FPS seemingly at random. Weird. It can have something to do with SLI, but I’m not competent to say.