The State of the Game – Tile Data (Devblog #1)

Hello everyone.

I hope you’re having a great day.

Today, I wanted to show the progress I currently have on the game. Now, I’ll be honest, the game was not originally going to be about Eseria. It was instead going to be about – well, maybe I’ll keep that in my pocket for later. I just mean to say that there is going to be some progress in the game and engine before I started this blog or project.

Me from the future: I was originally hoping to write the entire set of what I have created for the game in this article, but that isn’t going to happen. I didn’t realize how complicated level data is.

Let’s take a look at what we have to begin with!

There’s a significant amount of debug information at the moment, and the UI is not implemented yet, but we have a tile-based level system already working.

The Tile System

Internally, there is a 2D vector of tile IDs:

std::vector<std::vector<tilefullid_t>> tileData;

This makes up our main level data. Each tile is 16×16 in size, which is rendered at 64×64 on the screen. The tile’s ID simply determines which tile is located at that area.

Data Formats & Storage

This is read in from an iemap file, which is the InsanityEngine format for storing tilemap data. It is currently stored in ASCII text format, and is WOEFULLY inefficient at storing data. There is already work on making it into a binary/zip format.

InsanityEngineTileMap
Version 1
Width 97
Height 145
Tileset BasegameStandard

MapName Test1
Data
    000E 0003 0000 ...
End Data

If you would like to download the example tile map, you can do so here. Unfortunately, you will need to rename it from .txt if you want to use it with InsanityEngine. You’ll see that the file is very large for what it describes.

Test1.iemap_Download tilemap

Instead of saving full IDs, it only saves tile IDs. This is because in a tilefullid_t, the upper two bytes are used to store the Mod ID. Since the tilemap is stored in a mod folder, we already know its ID. Storing it along with the data causes a staggering additional loss in efficiency.

I thought for a long time about how I could tackle the efficiency problem for tilemaps. Remember that I am trying (if possible) to keep the game under 125MB of disk space. This small test map is already 70KB of space, which is alarmingly high considering that I want the game to be relatively large.

Zip Files

Oh, the zip file. A classic of data storage, and it turns out, a really good option for compressing level data that comprises of 100% “000E” and “0003”. Compressing the level data using standard Windows Zip files reduced it from 71,163B to 932B. That’s a reduction of about 98.7%. 7-Zip using LZMA2 with Ultra compression got 565B, An additional 39.4% reduction from the zip file.

I did a comparison between the major zip algorithms, and it is perhaps well that I have as I found something I hadn’t known before.

It turns out, bzip has the best compression ratio! Here’s a comparison of compression data.

As I expected, uncompressed archive formats like Tar and Wim do not improve the file ratio at all, and in fact make it larger overall. Reading up on bzip2, it looks like since it doesn’t support archiving (you can only compress single files), it can compress the files much more. I assume this is in part due to not storing path / filename information, which seems to be true looking at the actual content of the gzip archive and bzip2 archive. However, I am not an expert in zip algorithms, so take my guesses with a grain of salt.

Playing with the two settings I really have (compression speed and dictionary size) results in a wild 7 byte larger file at the fastest compression setting.

Well, now I guess it’s time to figure out how to graft BZip2 into the engine.

Performance

Let’s try to see how fast these things are. I raised the map’s size to the largest currently possible value (8192 × 8192). This takes over a full minute to write the file, which is why I’m looking into binary formats. The file in ASCII format is 320 MB (335,585,396 Bytes). Loading this map takes over 10 minutes on my Ryzen 9 7900X. I had to know what was taking so long.

/// <summary>
/// Reads map data from a file
/// </summary>
/// <param name="dat">The string to read from</param>
void TileMap::ReadMapData( sstr &dat ) {
    dat.erase( dat.begin( ), dat.begin( ) + 5 );
    tileData.resize( height );

    for ( int y = 0; y < height; y++ ) {
        tileData[ y ].resize( width );

        for ( int x = 0; x < width; x++ ) {
            tilefullid_t d = 0;

            dat.erase( dat.begin( ), dat.begin( ) + dat.find_first_not_of( L" " ) );
            swscanf_s( dat.data( ), L"%4x", &d );
            dat.erase( dat.begin( ), dat.begin( ) + 4 );

            tileData[ y ][ x ] = ( tileid_t ) d;
        }
        if ( dat.size( ) < 5 )
            break;

        dat.erase( dat.begin( ), dat.begin( ) + dat.find( L"\n" ) + 1 );
    }
}

This was the code to load the tilemap in ASCII format. “sstr” is an alias for std::wstring. This code is extremely slow for large maps, as it has O(n³) complexity. We loop over y, x, and find_first_not_of has n complexity. This is extremely slow, but our main loss of speed actually comes from swscanf_s. It turns out, providing a long data string to it causes a lot of performance loss. Instead, since we know we’re always looking for a 4-digit hexadecimal number, we can optimize this by copying it to a buffer first.

000E 000E … 000E\n000E …
(What dat.data() looks like)

I also took this time to remove all of the string code and instead consume the buffer using a const wchar_t*. This saves a huge amount of time. When I paused the code after 10 minutes last time, it was at (87, 0). Now, it reads in “””only””” 32 seconds. This is still extremely bad, but it’s certainly better than the previous average case of 7,713,662 minutes to load the map. In Release, it loads in only 5 seconds.

Here’s the updated code for ReadMapData.

    /// <summary>
    /// Reads map data from a file
    /// </summary>
    /// <param name="dat">The string to read from</param>
    void TileMap::ReadMapData( sstr &dat ) {
        tileData.resize( height );

        const wchar_t *data = dat.data( ) + 5;
        const wchar_t *dataInit = dat.data( ) + 5;
        size_t dataLen = dat.size( );

        wchar_t *thisId = new wchar_t[ 5 ];
        thisId[ 4 ] = 0;
        for ( int y = 0; y < height; y++ ) {
            tileData[ y ].resize( width );

            for ( int x = 0; x < width; x++ ) {
                tilefullid_t d = 0;

                int offset = scanForSpace( data, 64, L' ', true );
                data = data + offset;

                memset( thisId, 0, 4 * sizeof( wchar_t ) );
                memcpy( thisId, data, 4 * sizeof( wchar_t ) );

                swscanf_s( thisId, L"%4x", &d );
                data += 4;

                tileData[ y ][ x ] = ( tileid_t ) d;
            }
            // Remaining data is less than 5 bytes.
            if ( ( ( data - dataInit ) - dataLen ) < 5 ) {
                break;
            }
            int nextOffset = scanForSpace( data, 64, L'\n' ) + 1;
            data += nextOffset;
        }
    }

And the helper code for scanForSpace:

    static int scanForSpace( const wchar_t *string, int limit, wchar_t searchFor = L' ', bool notMode = false ) {
        for ( int i = 0; i < limit; i++ ) {
            if ( notMode ) {
                if ( *( string + i ) != searchFor ) {
                    return i;
                }
            }
            else {
                if ( *( string + i ) == searchFor ) {
                    return i;
                }
            }
        }
        return -1;
    }

Ultimately, I could probably optimize this code more to get it to be much faster, but I’m not super worried about it as I am going to switch to binary files for map data anyway, and since we won’t have to translate from text to numbers, it should be faster (since I’m loading tile data directly into the vector instead of read/convert/write).

For those concerned about the state of dat in ReadMapData, it turns out the data is discarded immediately after the map data is read.

…

Binary Files

Well, I may have gotten carried away. I implemented binary level loading. The filesizes are about 13% of the original file type’s size. This makes sense, since each of the tiles are now using 2 bytes instead of 5. Also, there’s no leading space for the line. Here’s the format for Version 1 of the IETilemapBin format. There are no line breaks, they’re just here for clarity.

IETMBIN &lt;version>
&lt;width>&lt;height>
&lt;len of tileSet>&lt;tileSet>
&lt;len of mapName>&lt;mapName>
&lt;level data until EOF, left to right, top to bottom>

Version must be 1, and all numerical values are unsigned 16-bit integers. On both Debug and Release, the level data is loaded in 0.25s for an 8192 × 8192 map. I could show you the entire thing, but I’ll just show you the most important bit; the main tile-reading loop.

//Free the previous buffer
delete[ ] buf;
//Reallocate to new size
buf = new char[ xWidth * 2 ];

tileData.resize( xHeight );
for ( int y = 0; y < xHeight; y++ ) {
    tileData.at( y ).resize( xWidth );
    memset( buf, 0, xWidth * 2 );
    ifs.read( buf, xWidth * 2 );

    unsigned __int16 *bufId = reinterpret_cast< unsigned __int16 * >( buf );

    for ( int x = 0; x < xWidth; x++ ) {
        unsigned __int16 tileId = *( bufId + x );
        tileData[ y ][ x ] = tileId;
    }
}

delete[ ] buf;

I read the file’s contents into the buf array, which is sized at xWidth * 2. xWidth is just the reported width of the tilemap. We then read this array as if it stored integers instead of characters. This way, we’re loading data directly from the file.

Regional Loading

Ultimately, the level is stored as one giant map, which isn’t great for my arbitrary and almost certainly unnecessary 256MB memory limit. 8192 × 8192 × 2 = 134,217,728 (128MB) of memory. That means that if the entire map was loaded at once, we’d be half out of memory.

Well, what if we didn’t load the entire map at once? I know, what a crazy thing to suggest, but there’s a cool feature of InsanityEngine that is planned that helps with this greatly. So, since the game is implemented as a mod, it essentially “pastes” its map at 0,0 in the world. Mods can “paste over” the basegame’s content, allowing them to extend or replace parts of the map. (How to handle multiple mods overwriting the same area has not yet been solved.)

So, since we support writing multiple maps to the global level data, we can simply implement the basegame’s map as a set of “region” maps, which can be loaded and unloaded as needed. This decision will bite me in the ass on the next game (where interactions must still be handled in unloaded regions), but we live in the now.

Variability

You may have noticed on the screenshot you saw 30 minutes ago (sorry about all the data structures rants) that the background and brick tiles look like they’ve been strewn about at random. You’d actually be correct!

These tiles will be different each time you load the game. That’s because tiles actually have a range of variants. If I switch to the secret-not-really-that-secret level editor view, you can see all of the details for each tile.

The text is pretty hard to read, but you can see some interesting data just at a glance. Any tile that has a green top and left edge is a tile that was not changed from its base. Every “red” tile is actually a variant of its original tile.

for ( int y = 0; y < height; y++ ) {
    for ( int x = 0; x < width; x++ ) {
        tileid_t us = tileData[ y ][ x ];

        attr_unlikely
            if ( !tiles->tiles.contains( us ) ) {
                tileData[ y ][ x ] = 0;
                continue;
            }

        auto tile = tiles->tiles[ us ];
        int  range = tile->getVariantRange( );
        tileData[ y ][ x ] = us + ( range != 0 ? ( rand( ) % range ) : 0 );
    }
}

This is how the tile data is processed after reading the file. If the tile doesn’t exist in the tileset, it gets zeroed to prevent errors later on. The latter half of the function gets the number of variants for each tile, and then adds a random number between 0 and the number of variants (exclusive) to add to the Tile ID. This is why variants need to be stored next to each other.

When you save the tilemap, it actually reverts all of the variant selection, and replaces those tiles with the base tile. This actually is one of the main reasons the tilemap is so compressible at the moment – only a few IDs are really used.

The Tileset

A tileset is just an image atlas and various flags & data for each tile. For example, if we look at Tile 3, we see that it has variants, should be solid for collision, and is opaque. Variants are stored to the right.

Here’s what that looks like in the Tileset definition file:

    #Brick common
    Tile 0003 
        Collision Solid
        Opaque yes
        Location tile 3, 0
        Variants 0004 - 000D
        VariantFlow right

I guess while I’m here, I can show you the actually-secret tile editor. If you go into Menu 3, you can go to the Tile Editor, where you can actually edit the tileset in-game.

The menu is not flashy by any means, but it works. Click any tile, and you can draw over it. The level will update to the new version of the tileset automatically.

Alright, I’m going to cut it off now. I’ve been writing this for a while now, and it’s probably getting too long to read anyway. Next time, we’ll take a quick look at rendering. It’s actually somehow less complicated than tile data.

If you have any questions, send me an email at eseria_dev at the website you’re on. Or join my Discord.

Anyway, thanks for reading what is essentially my stream-of-conscience about tiles. There’s a lot to implement, but we’ve got a lot done, too!