What is run-length encoding?

Run-length encoding is a simple lossless compression scheme that replaces a run of identical consecutive symbols with a single count and one copy of the symbol. AAAB becomes 3A1B.

Does RLE always make data smaller?

No. RLE only helps when the data has long runs of repeated symbols, such as simple images or whitespace. For random or already-varied text it can make the output longer, since every single character becomes a count plus the character.

How does this tool format the encoding?

It uses count-then-character pairs with no separators, for example AAABBC encodes to 3A2B1C. Every run, including runs of length one, is written with an explicit count so decoding is unambiguous.

Can the decoder handle multi-digit counts?

Yes. A count like 12 is read fully before the following character, so 12X expands to twelve X characters. The decoder reads all leading digits as the count for the next symbol.

What happens with digits in the original text?

Because counts are numeric, plain digits in the source can be ambiguous to decode. Encoding still works (each digit run gets a count), but if your original data contains digits, round-tripping relies on every run being written as count+character, which this tool always does.

What is the Run-Length Encoding (RLE)?

Run-length encode text by replacing consecutive repeated characters with a count followed by the character, and decode count-character pairs back to the original string. Handles digits, symbols and Unicode. It runs free in your browser on Gera Tools, with nothing uploaded.

Run-Length Encoding (RLE)

Name: Run-Length Encoding (RLE)
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

What this tool does

Run-length encoding (RLE) is one of the oldest and simplest lossless compression methods. It replaces a “run” of identical consecutive characters with a single count followed by the character. This tool both encodes text into RLE form and decodes RLE back to the original.

How it works

Encoding scans the string left to right, counting how many times the current character repeats consecutively. When the character changes (or the string ends), it writes the count followed by that character, then resets the counter. Every run — even a run of length one — is written with an explicit count, so AAABBC becomes 3A2B1C. This explicit-count rule keeps decoding unambiguous.

Decoding reads a sequence of count-then-character pairs. It accumulates all leading digits into a number, then repeats the next character that many times. So 12X expands to twelve Xs, and 3A2B1C expands back to AAABBC.

Example

Input:   WWWWWWWWWWWWBWWWWWWWWWWWWBBB
Encoded: 12W1B12W3B

That run of 27 characters compresses to 10 — a clear win because the data is highly repetitive.

When RLE helps (and when it doesn’t)

RLE is one of the oldest and most situational compression schemes in computer science. It was used in early fax machines, in PCX and BMP image formats, and in early game sprite compression — all contexts where pixel data has long horizontal runs of identical colour. Understanding when it wins and when it loses saves a lot of frustration:

RLE helps when:

Data has long unbroken runs: simple bitmap images (black-and-white scans, line art), whitespace-heavy text, repeated tokens in generated data.
The encoding is part of a larger system and pre-processes data before a dictionary-based compressor such as LZ77.

RLE hurts when:

Data is varied or random. Every unique character costs two characters in the output (a 1 count plus the character itself), so ABCDEF encodes to 1A1B1C1D1E1F — nearly double the size.
English prose text typically loses from RLE applied naively, because most runs are length one.

Edge cases worth knowing

Digits in the source text — because counts are numeric, a digit like 3 in your original data can look like part of a count when decoded. This tool always writes a count for every run (so plain 3 becomes 13 in the encoded output, meaning “one of ‘3’”), which makes the output unambiguous, but bear this in mind if you are manually editing encoded text.

Multi-digit counts — a run of 100 identical characters encodes to 100X, not 1001X. The decoder reads all consecutive digit characters as the count before reading the next non-digit as the symbol. Make sure any manual edits respect this: inserting a space between count and character would break decoding.

Unicode — the tool counts by Unicode code point, so a single emoji such as 😊 is treated as one symbol, not as multiple bytes. This gives correct results for modern text but may differ from byte-level RLE implementations.

Notes

The tool counts by Unicode code point, so emoji and accented characters are treated as single symbols.
For correct round-tripping, keep the count-character pairing intact when editing encoded text by hand.
RLE is lossless: decode the output and you always recover the original input byte-for-byte.