What is a \uXXXX escape?

It is a way to write a Unicode character using its code point in hexadecimal. \u00E9 represents the letter e with an acute accent, and \u03A9 represents the Greek capital omega. JavaScript, Java and JSON all use this form.

How are emoji and other astral characters handled?

Characters above U+FFFF need either two \uXXXX escapes forming a UTF-16 surrogate pair, or a single \UXXXXXXXX escape. This tool offers both styles, and the decoder reassembles surrogate pairs automatically.

What is the difference between the encoding styles?

The \uXXXX style matches JavaScript and Java and uses surrogate pairs for emoji. The \UXXXXXXXX style matches Python and C. Escape every character forces all text into escapes, which is handy for fully ASCII-safe output.

Can it decode mixed text and escapes?

Yes. The decoder leaves ordinary characters untouched and only converts valid \uXXXX and \UXXXXXXXX sequences, so you can paste a mostly normal string that contains a few escapes.

No. The conversion runs entirely in your browser with JavaScript and nothing is sent to a server.

What is the Unicode \u Escape Encoder & Decoder?

Free Unicode escape tool — convert characters to JavaScript and Java style \uXXXX escapes (with surrogate pairs) or Python and C style \UXXXXXXXX, and decode them back. Runs in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Unicode \u Escape Encoder & Decoder

Name: Unicode \u Escape Encoder & Decoder
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

The Unicode escape encoder and decoder turns any text into ASCII-safe \uXXXX or \UXXXXXXXX escape sequences, and converts those sequences back into readable characters. This is the standard technique for embedding accented letters, symbols, emoji, and non-Latin scripts in source code, JSON, configuration files, or any data format that must remain 7-bit ASCII.

When you need Unicode escapes

Unicode escapes solve a practical problem: many toolchains, configuration formats, and network protocols guarantee safe handling of ASCII but may mangle, strip, or misinterpret non-ASCII bytes depending on encoding declarations, locale settings, or pipeline quirks. By escaping to pure ASCII, you sidestep encoding issues entirely and produce output that any ASCII-aware tool can store, transmit, or diff without corruption.

Common situations:

Writing string literals in JavaScript, Java, or JSON that contain non-Latin characters
Embedding characters in XML attributes without risking entity confusion
Committing source files that must survive ASCII-only version control or code review tools
Preparing strings for environments where the encoding is unknown or not guaranteed

The encoding styles explained

\uXXXX (JavaScript, Java, JSON): Four hex digits, covers U+0000–U+FFFF directly. Characters above U+FFFF (most emoji, some historic scripts) require two \uXXXX escapes forming a UTF-16 surrogate pair. JavaScript and Java string parsers understand both forms. JSON specification only defines \uXXXX (not \U), so surrogate pairs are the JSON-compliant way to encode emoji.

\UXXXXXXXX (Python, C): Eight hex digits covering the full Unicode range. A single escape handles any code point, including those above U+FFFF. Python 3 string literals support both \uXXXX (BMP only) and \UXXXXXXXX (any code point).

Escape every character: Forces every character — even basic ASCII letters — into escape sequences. Useful when the output must be verifiably 7-bit ASCII with no exceptions, for example in certain protocol fields or legacy system imports.

How encoding works

Each character’s code point is formatted in hexadecimal:

For BMP characters (U+0000–U+FFFF): write \u followed by the four-digit hex code point.
For supplementary characters (U+10000–U+10FFFF) in JavaScript/Java mode: compute the surrogate pair. Subtract 0x10000, split the result into a 10-bit high part and 10-bit low part, add 0xD800 to get the high surrogate and 0xDC00 to get the low surrogate, then write both as \uXXXX.
For supplementary characters in Python/C mode: write \U followed by the eight-digit hex code point.

How decoding works

The decoder scans left to right for \u and \U markers. It reads the following four or eight hex digits and reconstructs the character. When it encounters a high surrogate (U+D800–U+DBFF) immediately followed by a low surrogate (U+DC00–U+DFFF), it combines them using the inverse surrogate formula to produce the correct supplementary character. Sequences that are incomplete or malformed are passed through unchanged rather than guessed.

Examples

Character	Code point	JavaScript/Java	Python/C
é	U+00E9	`é`	`é`
€	U+20AC	`€`	`€`
😀	U+1F600	`😀` (surrogate pair)	`\U0001F600`
漢	U+6F22	`漢`	`漢`

Note how 😀 requires two \u escapes in JavaScript but only one \U in Python.

Tips for common workflows

JSON strings: always use \uXXXX with surrogate pairs for emoji — the JSON spec does not define \U.
Java .properties files: use \uXXXX for any non-ASCII character; the file is expected to be ISO-8859-1 with escapes.
Python source: use \UXXXXXXXX for supplementary characters; \u only reaches U+FFFF.
Decoding only part of a string: the tool leaves non-escape text untouched, so you can safely decode strings that mix escaped and unescaped characters.

The surrogate-pair maths, worked

Encoding an astral character to a UTF-16 surrogate pair follows a fixed formula. Take 😀 (U+1F600):

cp      = 0x1F600
cp'     = cp - 0x10000        = 0x0F600
high    = 0xD800 + (cp' >> 10)  = 0xD800 + 0x3D = 0xD83D
low     = 0xDC00 + (cp' & 0x3FF) = 0xDC00 + 0x200 = 0xDE00
result  = 😀

Decoding inverts it: cp = 0x10000 + ((high - 0xD800) << 10) + (low - 0xDC00). A lone high surrogate with no following low surrogate is invalid UTF-16 — this tool leaves such malformed input unchanged rather than guessing, which is the safe behaviour when round-tripping data you do not control.

Which escape a language actually accepts

Target	`\uXXXX` (BMP)	Surrogate pair for astral	`\UXXXXXXXX`
JSON	Yes	Yes (required)	No
JavaScript / TypeScript	Yes	Yes	No (use `\u{1F600}` instead)
Java	Yes	Yes	No
Python 3	Yes	No (decodes as two chars)	Yes
C / C++	Yes (`\u`)	No	Yes (`\U`)

Note that modern JavaScript also accepts the brace form \u{1F600}, which is neither of the two classic styles but is the cleanest option in ES2015+ source.

Sources

Unicode Consortium — UTF-16 and surrogate pairs (Core Specification, §3.9) — the encoding forms and D800–DFFF surrogate range.
ECMA-404 — The JSON Data Interchange Standard — why JSON defines only \uXXXX.

All conversion runs in your browser — nothing is uploaded.