Hello everyone.
I hope you’re doing well today.
Today, I wanted to speak about the rendering engine for Insanity Engine, and how it draws those sweet, sweet tiles.
The Renderer
Insanity Engine has a renderer interface that allows each platform to handle its rendering separately. This is handled by having all platform-specific code have its own implementation that adheres to the interface. For example, the drawline function on Windows is handled in Direct2D, but in Linux will be handled by OpenGL.
Here is the renderer’s interface:
public:
virtual void clear( IEColorF color ) = 0;
virtual void begin( ) = 0;
virtual void end( ) = 0;
virtual void drawLine( Util::Location a, Util::Location b, IEColorF color, strokeOpts_t *opts = nullptr ) = 0;
virtual void drawLine( Util::Location a, Util::Location b, float cR, float cG, float cB, float cA = 1.0f,
strokeOpts_t *opts = nullptr ) = 0;
virtual void drawLine( float aX, float aY, float bX, float bY, IEColorF color, strokeOpts_t *opts = nullptr ) = 0;
virtual void drawLine( float aX, float aY, float bX, float bY, float cR, float cG, float cB, float cA = 1.0f,
strokeOpts_t *opts = nullptr ) = 0;
virtual void fillRect( Util::IERectangleF rect, IEColorF color, fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillRect( Util::Location topLeft, Util::Location bottomRight, IEColorF color,
fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillRect( float aX, float aY, float bX, float bY, IEColorF color,
fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillRect( Util::Location topLeft, Util::Location bottomRight, float cR, float cG, float cB,
float cA = 1.0f, fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillRect( float aX, float aY, float bX, float bY, float cR, float cG, float cB, float cA = 1.0f,
fillOpts_t *fillOpts = nullptr ) = 0;
virtual void drawRect( Util::IERectangleF rect, IEColorF color, strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawRect( Util::Location topLeft, Util::Location bottomRight, IEColorF color,
strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawRect( float aX, float aY, float bX, float bY, IEColorF color,
strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawRect( Util::Location topLeft, Util::Location bottomRight, float cR, float cG, float cB,
float cA = 1.0f, strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawRect( float aX, float aY, float bX, float bY, float cR, float cG, float cB, float cA = 1.0f,
strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void fillEllipse( Util::IERectangleF bbox, IEColorF color, fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillEllipse( Util::Location topLeft, Util::IESize size, IEColorF color,
fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillEllipse( float aX, float aY, float width, float height, IEColorF color,
fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillEllipse( Util::Location topLeft, Util::IESize size, float cR, float cG, float cB,
float cA = 1.0f, fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillEllipse( float aX, float aY, float width, float height, float cR, float cG, float cB,
float cA = 1.0f, fillOpts_t *fillOpts = nullptr ) = 0;
virtual void drawEllipse( Util::IERectangleF bbox, IEColorF color, strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawEllipse( Util::Location topLeft, Util::IESize size, IEColorF color,
strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawEllipse( float aX, float aY, float width, float height, IEColorF color,
strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawEllipse( Util::Location topLeft, Util::IESize size, float cR, float cG, float cB,
float cA = 1.0f, strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawEllipse( float aX, float aY, float width, float height, float cR, float cG, float cB,
float cA = 1.0f, strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawPoly( Util::Location *points, unsigned __int32 numPoints, IEColorF color,
strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void drawPoly( Util::Location *points, unsigned __int32 numPoints, float cR, float cG, float cB,
float cA = 1.0f, strokeOpts_t *strokeOpts = nullptr ) = 0;
virtual void fillPoly( Util::Location *points, unsigned __int32 numPoints, IEColorF color,
fillOpts_t *fillOpts = nullptr ) = 0;
virtual void fillPoly( Util::Location *points, unsigned __int32 numPoints, float cR, float cG, float cB,
float cA = 1.0f, fillOpts_t *fillOpts = nullptr ) = 0;
virtual void drawImage( Util::Location wher, const ImageBase &image, imageOpts_t *opts = 0 ) = 0;
virtual void drawImage( Util::IERectangleF wher, const ImageBase &image, imageOpts_t *opts = 0 ) = 0;
virtual void drawImage( Util::IERectangleF dstRect, Util::IERectangleF srcRect, const ImageBase &image, imageOpts_t *opts = 0 ) = 0;
virtual void drawImage( Util::Location wher, Image &image, imageOpts_t *opts = 0 ) = 0;
virtual void drawImage( Util::IERectangleF wher, Image &image, imageOpts_t *opts = 0 ) = 0;
virtual void drawImage( Util::IERectangleF dstRect, Util::IERectangleF srcRect, Image &image, imageOpts_t *opts = 0 ) = 0;
virtual void drawText( Util::Location wher, const wchar_t *str, size_t strLen, Text::IEFont &font,
IEColorF color ) = 0;
virtual void drawText( Util::IERectangleF wher, const wchar_t *str, size_t strLen, Text::IEFont &font,
IEColorF color ) = 0;
virtual void drawText( Util::Location wher, const wchar_t *str, size_t strLen, Text::IEFont &font,
IEColorF color, unsigned __int64 layoutKey ) = 0;
virtual void drawText( Util::IERectangleF wher, const wchar_t *str, size_t strLen, Text::IEFont &font,
IEColorF color, unsigned __int64 layoutKey ) = 0;
Each of these functions are pure virtuals, which means that the implementation code must define each function. For example, here is that DrawLine function from Direct2DRenderer.cpp:
void Direct2DRenderer::drawLine( Location a, Location b, IEColorF color, strokeOpts_t *opts ) {
float strokeWidth = 1;
if ( !drawReady ) {
throw renderer_not_ready( "The renderer was not ready!" );
}
if ( opts ) {
strokeWidth = opts->width;
}
reuseBrush->SetColor( color );
rt->DrawLine( a, b, reuseBrush, strokeWidth, 0 );
}
This allows the Graphics class not to worry about how the actual draw calls are executed. Instead of having super messy code to support each platform, we simply have the renderer worry about it.
One thing that makes it a little annoying is that we have to have our own copy of each rendering primitive. For example, I had to implement image data myself, along with fonts, locations, rectangles, and colors (in both integer and float formats). Honestly, I think it is worth the peace of mind knowing that you can simply use
renderer->drawLine without worrying about how the platform will handle your variables.
Images
I wanted to take a quick detour to talk about how images are stored in Insanity Engine. Of course, we need the image data itself, but how can we actually store it?
The answer is pretty standard – we store data in 32-bits-per-pixel ARGB format. That is, each channel is an 8-bit unsigned integer with values from 0 – 255. We just pack these channels next to each other to get our 32-bit pixel data. We store these pixels in a really long array, and store the width and height of the image along with it.
That being said, it’s not useful to renderers to have the data in memory – you need to send it to the GPU. That’s why we actually have two image data classes – ImageBase and Image.
ImageBase stores the image data in the main memory, and is entirely platform agnostic. Image, however, is the platform-dependent version of ImageBase.
When we modify the image’s data, we call image_was_modified, which will release the internal GPU bitmap and recreate it with the new data. This isn’t particularly fast, but such is the nature of offloading data.
Level Renderers
Alright, now that we know how images work, we can get into the bread and butter of the game’s world rendering; the Level Renderer. The most simple renderer is the SingleTileLevelRenderer.
The Single Tile Level Renderer works by drawing each tile each frame. This ends up taking forever. On my setup, it takes 3 ms to draw the level. That doesn’t sound like a lot, but considering that drawing the level is taking over 80% of the frame’s time, it’s not great.
I tried a few optimizations. I thought that perhaps drawing tiles in order of ID might help with some cache optimizations, but it ended up being slower. I tried optimizing the code that runs on the CPU, but no dice. So, I figured we should try something else:
The Batching Level Renderer
This level renderer works by drawing the entire level to an image, and simply draws that image to the screen. It’s really fast, too, only taking around 100us (0.1ms) to draw the level. That worked fine when the level as 141 × 359, but with my experiments in the previous post, the level is 8192 × 8192 now. The size of each tile is 16 × 16. Tiles are rendered at 4x scale, but the intermediary images are 1:1, so that’s not a factor in this. There are 4 channels per pixel.
Factoring in all of the factors, we find that the image is:
8192 × 8192 × 16 × 16 × 4 = 68,719,476,736 bytes of memory (64 GiB)
Needless to say, 64GB is much higher than my memory target of 256MiB or less. We need a new solution.
The Semi-Batching Level Renderer
So if drawing the tiles to intermediary images and then drawing those instead is fast, and drawing the entire level at once is memory-prohibitive, what if we drew small chunks of tiles to intermediaries and then drew those?
I went with 32×32 tiles originally, which used about 1MiB per tile. However, the performance wasn’t great, costing around 670us to draw the level. Also, the memory usage was terrible! We’ll get into why just a bit later.
I tuned the batch size and tried 16×16, 12×12, and eventually 8×8.
Bingo! The performance is great, at about 256us on my system. The memory usage is great too, at only 64KiB per tile. Well, slight issue – they’re never freed. Let’s look at what we have.
In this image, let’s assume this is the entire screen. We can see zones (9,8), (9,9), (10,8) and (10,9) clearly. However, zones (8,8), (8,7), (9,7), (9,8), (10,8), (11,8), (10,10), (11,10), (10,10), (9,10), (8,10), and (8,9) are visible nearby as well. So, we make sure that all 17 tiles are rendered and in memory. Any other tiles in memory can be freed, so we just destroy their images. This keeps memory usage low.
Overall, our usage of ImageBase is only 1.915 MiB now, which includes the tileset and sprites. There are 30 of them in the worst case, which comes out to 1.875 MiB spent on level data. That’s not only a hell of a lot better than 64 GiB, it’s actually 34952.53…x less than 64GiB. The average case is 1.5MiB, which is 24 intermediary tiles. That’s 43,690.66…x times better than 64GiB.
So, that’s what I use in the engine! I could probably save more memory by lowering the tile’s size, but I think that this is fine. I tried it out with 4×4 tiles, and the memory usage went to about 1.1MiB and performance worsened slightly to 0.4ms to draw the level.
Alright, now I’m sick and tiled of talking about level data! No more!! Next time, I make the guarantee that tiles will not be discussed!
Leave a Reply