Unicode Normalization Forms

NFC, NFD, NFKC, NFKD explained with composition/decomposition rules and use cases.

Reference for Unicode normalization forms with canonical vs compatibility equivalence and programming guidance. Paste text and see live NFC, NFD, NFKC and NFKD output with length and code-point changes.

What is the difference between NFC and NFD?

Both preserve the exact same characters (canonical equivalence). NFD decomposes each accented letter into a base letter plus separate combining marks, while NFC composes them back into single precomposed characters where one exists. NFC is usually shorter and is the recommended form for storage and the web.

Make equal-looking text actually equal

Unicode often lets you write the same character more than one way. The letter “é” can be a single precomposed code point, or an “e” followed by a separate combining accent. Visually identical, byte-for-byte different. Normalization rewrites text into one of four canonical forms so that equivalent strings become identical, which is essential for comparison, search, sorting and security. This tool shows all four forms — NFC, NFD, NFKC, NFKD — for any text you paste.

How it works

Normalization runs in two stages: decompose, then optionally re-compose.

  • NFD (Canonical Decomposition) breaks composed characters into base + combining marks, in a defined order.
  • NFC (Canonical Composition) decomposes, then recombines into precomposed characters where one exists. This is the shortest canonical form and the web standard.
  • NFKD (Compatibility Decomposition) decomposes using compatibility mappings too, so the ligature becomes f + i and full-width becomes A. This is lossy.
  • NFKC (Compatibility Composition) applies the compatibility decomposition, then recomposes. Best for matching and identifiers.

Browsers expose this directly through "text".normalize("NFC"), which is exactly what this tool calls.

Tips and notes

Default to NFC for anything you store or display — it is what HTML, URLs and most filesystems expect. Use a K form only when you want lookalikes to fold together, such as search indexes, deduplication, or username uniqueness; remember it discards formatting and cannot be reversed. Always normalise both sides before comparing strings, never just one. Watch out: changing form can change the code-point count without changing the visible text, so never assume .length is stable across normalization.