The Unicode escape encoder and decoder turns any text into ASCII-safe escape sequences and converts those sequences back into readable characters. It is the standard way to embed accented letters, symbols and emoji inside source code or data files that must stay in plain ASCII.
How it works
Each character has a Unicode code point. Encoding writes that code point in hexadecimal inside a backslash escape. For characters in the Basic Multilingual Plane (up to U+FFFF) the result is \uXXXX with four hex digits. For characters above U+FFFF — most emoji and many historic scripts — there are two choices. The JavaScript and Java form splits the code point into a UTF-16 surrogate pair, written as two \uXXXX escapes. The Python and C form uses a single \UXXXXXXXX escape with eight hex digits.
Decoding scans the text for \u and \U markers, reads the hex digits, and rebuilds the character. When it sees a high surrogate (U+D800–U+DBFF) immediately followed by a low surrogate (U+DC00–U+DFFF), it combines them into the correct astral character. Anything that is not a valid escape is passed through unchanged.
Example
The smiley emoji 😀 is code point U+1F600. Encoded as a UTF-16 surrogate pair it becomes:
😀
Encoded in the Python and C style it becomes the single escape:
\U0001F600
Both decode back to the same emoji. The letter é (U+00E9) is simply é in every style.
Tips
- Use the surrogate-pair style for string literals in JavaScript and Java; use
\UXXXXXXXXfor Python source. - The “escape every character” option is useful when a pipeline only accepts 7-bit ASCII.
- Decoding is forgiving: incomplete sequences such as
\u12(only two digits) are left as literal text rather than guessed.