Tile compression

From Nesdev wiki
Jump to: navigation, search

Tile compression refers to techniques that allow fitting more graphics data into a smaller space. Programs using CHR ROM cannot use compressed tiles, as their tile data must be stored in the PPU's native format. But programs using CHR RAM can process tile data while copying it from PRG ROM to CHR RAM, and this processing allows storing more tiles in the same space.

Run-length encoding

Run-length encoding transforms runs of identical bytes into a shorter sequence of bytes that specifies the length of the run.

In NES tile data, byte-level run-length encoding works well when a row of 8 pixels in a tile is identical to the row above it. It also works well for nametable data because a horizontal run of blank tiles becomes a single tile.

Pixel-level run-length encoding is much slower but can achieve impressive results within a tile.

There are several different RLE data formats.


The PCX image format became popular on PC.

Value Meaning
00-BF Write this byte to the output.
C0-FF Read another byte, and write it to the output n - 192 times.


The PackBits format was invented by Apple for MacPaint. It is also used in TIFF files and a few homebrew releases by Damian Yerrick.

Value Meaning
00-7F Copy n + 1 bytes from input to output.
80 No operation
81-FF Read another byte, and write it to the output 257 - n times.

Konami RLE

This format is used in Blades of Steel, the U.S. version of Contra, and the Japanese version of Simon's Quest. It can be decoded and encoded with the Python program GraveyardDuck. Compression ratio is more or less identical to PackBits.

Value Meaning
00-80 Read another byte, and write it to the output n times.
81-FE Copy n - 128 bytes from input to output.
FF End of compressed data


The BIOS of the Game Boy Advance and Nintendo DS contains a decompressor for an RLE format very similar to PackBits and Konami. As described in GBATEK, it has a 4-byte size header followed by this:

Value Meaning
00-7F Copy n + 1 bytes from input to output.
80-FF Read one byte from input and write it to output n - 125 times.


This codec was conceived by Damian Yerrick as an alternative to PackBits for the Action 53 multicart, and a Python packer and 6502 unpacker are included in the Action 53 menu distribution. Unlike freeform RLE formats such as Konami and PackBits, PB53 operates on 16-byte units, making it easy to divide the decompressed data into fixed-size packets to be sent to the PPU during vblank while rendering is turned on. Like LZSS, PB53 uses unary coding on the lengths of literal runs to save on overhead from switching between literal and run modes. This means that like LZSS, it has a worst case expansion of 12.5%, but it works fairly well on real tile data and OK on nametable data, which have shorter runs than the high-resolution files for which PackBits was designed. It also has a special mode to accommodate the layout of Shiru's NROM games LAN Master, Lawn Mower, and Chase, which have many identical tiles between the two pattern tables to allow tile animation.

Each tile consists of several 8-byte planes, two planes in the NES implementation. For the first plane in a tile:

Value Meaning
00-7F Copy one byte from input to output. Then, for each bit from 6 to 0, if the bit is 1, repeat the previous byte; otherwise, copy another byte from the input. This is somewhat similar to how control bytes are formatted in LZSS.
80 Write eight $00 bytes.
81 Write eight $FF bytes.
82 Copy one tile (16 bytes) starting one tile back. (Used for a repeated tile, such as the unused tiles in many games.)
83 Copy one tile starting one segment back, usually 4096 bytes. (Used for pattern tables that share tiles, as seen in several Shiru games. The decoder switches between two instances one segment apart, each with its own input stream and output buffer.)
84 Write sixteen $00 bytes. (Solid color 0)
85 Write eight $FF bytes then eight $00 bytes. (Solid color 1)
86 Write eight $00 bytes then eight $FF bytes. (Solid color 2)
87 Write sixteen $FF bytes. (Solid color 3)

For other planes:

Value Meaning
00-81 Same as first plane
82 Copy previous 8 bytes. (Used for 1-bit tiles with colors 0 and 3.)
83 Copy previous 8 bytes, bit-inverted. (Used for 1-bit tiles with colors 1 and 2.)


This RLE variant was used by Joel Yliluoma in the Simon's Quest retranslation project. It is very efficient when compressing nametables, which often contain redundancy in more complex forms than simple runs of repeating bytes. Examples include brick walls, which repeat two tiles, and complex graphics that is formed from an ascending series of successive tiles. For bitmap compression, it is slightly inferior to simpler RLE methods.

Value Meaning
00–3F LIT: Copy (n+1) bytes from input to output backwards
40 END: End of stream
41–7F SEQ: Read next byte b. Put b, (n−0x3F) times; add 1 to b after each iteration
80–9F DBL: Read next byte b1, and next byte b2. Put b1, (n−0x7D) times; swap b2 and b1 after each iteration
A0–FF RUN: Read byte b. Put b, (0x101−n) times.

A compressor for this scheme can be found at http://bisqwit.iki.fi/src/nes-ppu_rleinc_compress.php.txt (PHP), and a decompressor at http://bisqwit.iki.fi/src/nes-ppu_rleinc_v2b.inc (CA65).

JRoatch made PBJ, which adds the PB8 mode from PB53 and a common-byte mode to a modified RLEINC.

Bit-based RLE

Most RLE schemes deal with whole bytes. There are also schemes where the compressed data forms a bitstream, that contains integers of different bit widths.

When compressing the combined tile graphics of Super Mario Bros. and Kirby's Adventure, a simple reference RLE algorithm (PackBits) gets a 12% reduction in data size. However, if the algorithm is modified as indicated below, a 21% reduction is achieved. For comparison, the graphics specialized algorithm in PB53 achieves 25% for that data set. Tokumaru compression manages to reduce the data by 41%.

Bit sequence Meaning
0000 End of stream.
0nnn Copy the next n×8 bits, i.e. n bytes, to the output.
1nnn Read the next 8 bits, and output this byte n+2 times.

A possible reason why bit-based compression methods are not popular on the NES is that bit-shifts are cumbersome and slow with the 6502 CPU, as it can only shift one bit at a time. The above algorithm is still relatively simple to implement, as it operates on units of 4 bits for the input and full bytes for the output. Coincidentally, it also produced the best compression out of all bit-based RLE algorithms that were brute-force-tested for that dataset.

NES Stripe Image RLE

A RLE format mostly used by Nintendo for use in their Arcade ports as well as their Mario games, Also used in some homebrew games!

Format: dest, len, data, [dest, len, data, ]* end

dest: PPU destination address (2 bytes, big endian), to be written to $2006

len: Length (Byte) of PPU Buffer Data:

Length Value Meaning
00-3F Literal to right: Copy n + 1 bytes to video memory addresses increasing by 1
40-7F Run to right: Copy one byte n - 63 times to video memory addresses increasing by 1
80-BF Literal down: Copy n - 127 bytes to video memory addresses increasing by 32
C0-FF Run down: Copy one byte n - 191 times to video memory addresses increasing by 32

data: PPU Data to display

end: End of Data marker. Early games use $00; later games (particularly those with CHR RAM) may use any value with bit 7 set ($80-$FF) to allow writing to the first 16 tiles of video memory.

See also Popslide, a video memory update buffer framework using this format

SNES Stripe Image RLE

Same RLE format used by Nintendo as above, but for SNES.

Format: dest, len, data, end

dest: PPU Destination: $2116 and $2117

len: Length (Word) of PPU Buffer Data:

Length Value Meaning
0000-3FFF Copy n + 1 bytes to PPU buffer
4000-7FFF Copy n + 1 bytes to PPU buffer, with RLE
8000-BFFF Copy n + 1 bytes to PPU buffer, increment 32 bytes
C000-FFFF Copy n + 1 bytes to PPU buffer, with RLE, increment 32 bytes

data: PPU Data to display in 2-byte increments (or word increments)

end: Unlike the NES version, the end byte is $FF or $FFFF.

Pixel based


This is a Markov chain (predictive) algorithm that packs predictions in varying number of bits. It works on packets measured in whole tiles, and compresses mostly pixel by pixel, making it slow. Explained on forum.


Tokumaru discovered an improvement to the way the frequency tables are changed in Codemasters algorithm, and released the compressor and decompressor as open source. And open-source rewrite of the encoder and decoder with slightly better performance can be downloaded at: http://bisqwit.iki.fi/source/tokumaru.html

The compressed data begins with a byte that tells how many tiles it encodes. 256 is maximum. Technically this byte can be ignored, if you already know how many tiles you are going to decode.

After the byte, any number of blocks follows. Each block begins with a color description table. This table tells how to encode transitions between colors in tiles belonging to this block.

There are four elements in this table, from 3 to 0, for color n. Each element begins with a two-bit integer ncolors[n], that tells how many different colors that may follow a pixel of this particular color. After the number of colors, comes a pivot color a that is encoded as follows:

Value Applicable when Meaning
nothing ncolors[n]=0 No pivot color
1 bit: 0 ncolors[n]>0 Pivot color a is 1 if n < 1, 0 otherwise.
2 bits: 10 ncolors[n]>0 Pivot color a is 2 if n < 2, 1 otherwise.
2 bits: 11 ncolors[n]>0 Pivot color a is 3 if n < 3, 2 otherwise.

The table of transition colors is then calculated using n, a, and the number of colors ncolors[n]. First, two other colors b and c are calculated. They are the first color indexes that are neither n nor a. E.g. if a=2 and n=1, b and c become 0 and 3 respectively.

When ncolors[n] is Table of possible transition colors next[n] is
0 []
1 [a]
2 [b,c]
3 [a,b,c]

For compression purposes, ideally ncolors[n] should be chosen to be the numbers of distinct colors that actually can follow the color n, based on measuring the tile data, and, if ncolors[n]=3, the pivot color a should be the color that is transitioned into most often from color n. This transition in tile data will be encoded in two bits, while the other transitions are encoded in three bits. For other values of ncolors[n], the choice of pivot color is mandated by the actual possible colors.

After the color description table, comes tile data encoding 16 bytes, or 8×8 pixels. Each tile is comprised of eight rows of pixels. Each pixel row begins with 1 bit, that tells whether the row is to be copied. If the bit is set, the previously decoded row is duplicated, and no other data is encoded for this pixel row. At the start of the block, the "previously decoded row" is assumed to contain zero bytes. The previous row is not reset between different tiles, unless a new block begins. If the bit was clear, eight pixels follow for x coordinates 0 to 7. Each pixel is encoded as follows, depending on the color c of the previous pixel:

Value Applicable when Meaning
2 bits x = 0 The color of the first pixel c is encoded as a 2-bit integer.
nothing ncolors[c]=0 c does not change, and nothing is encoded.
1 bit: 1 ncolors[c]>0 c does not change from previous pixel.
1 bit: 0 ncolors[c]=1 c becomes next[c][0].
2 bits: 00 ncolors[c]>1 c becomes next[c][0].
2 bits: 01 ncolors[c]=2 c becomes next[c][1].
3 bits: 010 ncolors[c]=3 c becomes next[c][1].
3 bits: 011 ncolors[c]=3 c becomes next[c][2].

After each full tile, if there are still remaining tiles to be decoded, comes one bit that tells what comes next. Value 1 means a new block start, with a new color description table. Value 0 means that more tile data will follow.

Common byte

Oracle common byte

This codec, used in The Legend of Zelda: Oracle of Seasons and The Legend of Zelda: Oracle of Ages according to Dwedit, is roughly comparable to RLE in complexity. For each 16-byte block, the compressor determines the most common byte in that block. The compressed data for each block starts with a 2-byte mask, then the most common byte, then other bytes in order that aren't the most common.

To decode a block: First read the two-byte mask. If the entire mask is zero, set the bit corresponding to the first byte to true. Then read the most common byte. For each bit in the mask, if the bit is 1, write the most common byte to output; otherwise, copy one byte.

Maximum expansion is 12.5% for any block that has 16 different bytes in it: two bytes of mask and 16 bytes of data.


A lot of games on platforms after the NES use LZ77 family compression methods such as LZSS, which generalizes run-length encoding to allow copying data from several bytes ago, not just the previous byte. Few NES games use LZ77 because the NES's limited work RAM and limited access to video memory make it difficult to resolve back references. (Fewer still use LZW or any other LZ78-based method because of patents that subsisted through the NES, Super NES, and Nintendo 64 eras.)

In LZSS, the mask contains 8 commands, each either to copy a literal byte or to copy a back reference. determines whether the next 8 things are literal bytes or back references. Once all commands have been processed, read another mask.

Read a mask byte from input.
For each bit in the mask byte:
If the bit is 0, this is a literal:
Copy one byte from input to output.
Otherwise, this is a back-reference:
Read and decode a length and distance from input. These will be positive integers.
Copy length bytes from the previous output, distance bytes before now, to output.

LZSS flavors vary mainly in how they encode lengths and distances.


The BIOS of the Game Boy Advance and Nintendo DS has a built-in decompressor for a straightforward LZSS flavor with 3- to 18-byte references into the previous 4096 bytes of output. The data format is described in Martin Korth's GBATEK.

Caution: In data intended for decompression directly to the GBA or DS video memory, the second byte of a 16-bit word cannot refer to the first byte of the same word. So the encoder must not write a run with distance 1 that starts at an odd offset.

Oracle LZ

This flavor of LZSS is used in The Legend of Zelda: Oracle games for Game Boy Color according to Dwedit.

An entire compressed block can be compressed with one of two subtypes of Oracle LZ: short word mode and long word mode. Short words use references of 2 to 8 bytes into the previous 32 bytes of output, and long words use references of 3 to 33 bytes into the previous 2048 bytes. (A length value of 0 means read an additional byte and use that as the length.) Only short words would work very well on NES.

Pokémon LZ

This compression scheme is used in the first-generation Pokémon games on the Game Boy. It is used for bitmap compression.

A byte n is read and split into two parts: code = n >> 5, and c = n & 0x1F. Byte 0xFF marks the end of the stream. Otherwise the code is interpreted as follows:

code Meaning
0 Copy the next c + 1 bytes to the output.
1 Read another byte, and write it to the output c + 1 times.
2 Read another byte b1 and byte b2, and write byte b1 to the output c + 1 times, swapping b1 and b2 after each write.
3 Write a zero byte (0x00) to the output c + 1 times.
4 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output.
5 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, reversing the bits in each byte.
6 Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, decrementing the read position after each write (i.e. reverse the data).
7 Read another byte b. Change code = (c >> 2), and change c = (c & 3) × 256 + b. Re-interpret code according to this table.

Chrono Trigger LZ

This compression scheme is used in Square‘s Chrono Trigger for the SNES for compression of graphics and various data.

The data consists of packets. Each packet begins with a 16-bit offset, which gives the packet ending offset relative to the beginning of compressed data. At that offset, there is always a control byte. The first control byte sets the size of offsets: If the byte is < 0xC0, then offsetsbits=12. Else offsetbits=11. Interpreting the offsetbits is done only once. The higher-order two bits in the control bytes of all other packets are ignored. The counter is assigned a default value of 8.

The decompression loop goes as follows:

  1. If the packet end has been reached, a control byte is read, and counter is assigned the low 6 bits of that byte (i.e. counter = byte & 0x3F). If the counter is zero, the decompression is complete and ends there. If the counter was nonzero, a new 16-bit packet end offset is read.
  2. If the end of the packet has not yet been reached, a mask byte is read.
    • If the mask byte is zero, then next eight bytes are copied verbatim to the output. The counter is not affected.
    • If the mask byte was nonzero, its each bit is read, beginning from the lowest-order bit. The number of bits to be read is determined by counter (which is in range 1—63, i.e. it can be both smaller and greater than eight).
      • If the bit is zero, a single byte is copied verbatim to the output.
      • If the bit is nonzero, a 16-bit number is read from the input. offset becomes the lowest-order offsetbits from that number, and length is the rest of the bits plus 3. The decompressor copies length bytes from offset bytes behind to the output.
      • After all bits have been read, the counter is reset to the default value of 8, and the decompression loop continues.

External links