What is the difference between UTF-16 LE and BE?

Both encode the same 16-bit code units, but little-endian stores the low byte first and big-endian stores the high byte first. For example U+0041 (A) is 41 00 in LE and 00 41 in BE. Windows and most x86 software default to LE.

How are emoji and rare characters encoded?

Characters above U+FFFF cannot fit in one 16-bit code unit, so UTF-16 splits them into a surrogate pair: a high surrogate in D800-DBFF followed by a low surrogate in DC00-DFFF. Each surrogate becomes two hex bytes, so such characters take four bytes.

Does this add a byte order mark (BOM)?

No. The viewer shows only the encoded text bytes. A real UTF-16 file often starts with FF FE (LE) or FE FF (BE) as a BOM, but that is metadata you would prepend separately.

Why does my character count differ from the byte count?

Each basic character is two bytes, but astral characters such as emoji use four bytes because of surrogate pairs. So byte count is at least twice the code-unit count and can be higher than twice the visible character count.

Is the conversion lossless?

Yes. UTF-16 can represent every Unicode code point, so the hex output round-trips back to the original string exactly, regardless of language or symbol.

What is the UTF-16 Hex Viewer?

Converts any Unicode text to its UTF-16 little-endian and big-endian hexadecimal byte streams, expanding code units and surrogate pairs so you can inspect exactly how text is stored. Runs entirely in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

UTF-16 Hex Viewer — Gera Tools

Name: UTF-16 Hex Viewer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

UTF-16 stores text as 16-bit code units. This viewer turns any string into its raw UTF-16 byte stream in hexadecimal, in either little-endian or big-endian order, so you can see precisely how the text would sit in memory or in a file.

How it works

JavaScript strings are already UTF-16 internally, so each charCodeAt index returns one 16-bit code unit (0 to 65535). For each code unit the tool splits it into a high byte and a low byte:

high = (unit >> 8) & 0xFF
low  =  unit       & 0xFF
LE   = low  high      (low byte first)
BE   = high low       (high byte first)

Characters above U+FFFF are already represented in the string as a surrogate pair of two code units, so they naturally expand to four bytes without any extra handling. Each byte is printed as two zero-padded hex digits.

Example

The text Hi is two code units, 0048 0069. In little-endian that is 48 00 69 00; in big-endian it is 00 48 00 69. An emoji such as a rocket is a single character but two code units (a surrogate pair) and therefore four bytes. If you need the bytes for a file, remember a UTF-16 file usually leads with a BOM (FF FE for LE) which this tool deliberately omits so you see only the text.

When to use this tool

UTF-16 hex inspection is useful in several practical situations:

Debugging file parsers. File formats like DOCX, older Windows APIs, and Java .class files use UTF-16 internally. If a parser produces garbled output or reads strings as two-byte noise, comparing the raw LE hex against the expected code units quickly isolates whether the endianness is wrong.
Network protocol analysis. Some binary protocols send strings as length-prefixed UTF-16 sequences. Seeing the exact bytes helps you craft test payloads or verify a decoder is stripping the BOM correctly.
Understanding surrogate pairs. The astral plane (above U+FFFF) is where many emoji and rare CJK extension characters live. Pasting an emoji here and switching to big-endian view makes the D800–DFFF surrogate structure visible in a way that abstract Unicode charts do not.
Cross-platform byte-order bugs. A file written by a big-endian system (some SPARC or PowerPC environments, or Java with its default big-endian strings) and read by an x86 system without endianness handling will misparse every character. This viewer lets you confirm which byte order a file actually uses.

LE vs BE: which to choose

Little-endian is the default on almost every modern desktop and server: x86/x64 Windows, Linux, and macOS all store UTF-16 in LE order. If you are building something for a Windows API (like WriteFile with a wide string), or for .NET, LE is almost certainly correct. Big-endian is used by some network protocols, Java’s in-memory string representation, and legacy SPARC or MIPS environments. When in doubt, look for the BOM in your source file: FF FE means LE, FE FF means BE.

The BOM and why this tool omits it

A real UTF-16 text file often starts with a two-byte Byte Order Mark before the encoded text. The BOM tells a reader which byte order to use: FF FE for LE, FE FF for BE. This viewer omits it by design, showing only the bytes for the text you type. If you are writing a UTF-16 file for a consumer that requires a BOM, prepend these two bytes manually to the output.