Unicode NFC Normaliser

Compose Unicode sequences to NFC canonical form

Ad placeholder (leaderboard)

Unicode lets the same visible character be encoded in more than one way. NFC (Normalization Form C) collapses these alternatives into a single canonical composed form, which is the encoding the web recommends for storage and transmission. This tool applies NFC and shows you the code points before and after so you can see exactly what changed.

How it works

NFC is defined as a canonical decomposition followed by a canonical composition:

1. Canonical Decompose  é  ->  e + ◌́   (U+0065 U+0301)
2. Canonical Compose     e + ◌́  ->  é   (U+00E9)

The decomposition step puts everything into a fully separated, ordered form; the composition step then merges combining marks back onto their base characters wherever a precomposed code point exists. The result is the shortest canonically equivalent string. The tool relies on the engine’s native String.prototype.normalize("NFC"), which implements the full Unicode normalisation algorithm.

Notes and example

If you paste e followed by a combining acute accent, the output is the single character é (U+00E9) and the code point count drops by one. Text that is already composed is returned unchanged. Use NFC before storing user input or comparing identifiers so that visually identical names always match. For the opposite operation see the NFD decomposer.

Ad placeholder (rectangle)