What does NFC normalisation do?

NFC stands for Normalization Form C, or Canonical Composition. It first canonically decomposes the text, then recomposes combining marks onto their base letters, producing the shortest canonically equivalent representation.

Why would two visually identical strings not be equal?

The letter e-acute can be stored as one precomposed code point (U+00E9) or as a base e plus a combining accent (U+0065 U+0301). They look the same but are different byte sequences. NFC normalises both to the precomposed form so comparisons succeed.

When should I use NFC?

NFC is the recommended default for storing and transmitting text on the web. The W3C recommends NFC for HTML and most databases, identifiers, and APIs expect it because it is compact and stable.

Does NFC change the meaning of my text?

No. NFC only changes the encoding, not the visible characters. The result is canonically equivalent, so it renders identically while using the standard composed form.

Is my text uploaded anywhere?

No. Normalisation uses the browser's built-in Unicode engine via String.prototype.normalize, so everything happens locally and nothing is sent to a server.

What is the Unicode NFC Normaliser?

Normalise text to Unicode NFC (Canonical Composition), merging base letters and combining marks into precomposed characters. Compare code points before and after. Runs locally in your browser, nothing uploaded. It runs free in your browser on Gera Tools, with nothing uploaded.

Unicode NFC Normaliser

Name: Unicode NFC Normaliser
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Unicode lets the same visible character be encoded in more than one way. NFC (Normalization Form C) collapses these alternatives into a single canonical composed form — the encoding the W3C and most databases recommend for storage and transmission. This tool applies NFC and shows the code points before and after so you can see exactly what changed and why.

The problem NFC solves

Consider the letter é. Unicode provides two ways to represent it:

Precomposed: a single code point U+00E9 (LATIN SMALL LETTER E WITH ACUTE)
Decomposed: a base letter e (U+0065) followed by a combining acute accent (U+0301)

Both render identically in any properly implemented text renderer. But they are different byte sequences. This means:

A database WHERE name = 'café' may not match if the stored form and the query form differ
A URL containing an accented letter may resolve differently depending on which form is used
A password check, username lookup, or file deduplication may fail silently

NFC is the standard fix: it canonically decomposes text first (to put everything in a consistent base form), then recomposes combining sequences into precomposed characters wherever precomposed forms exist in Unicode. The result is the shortest canonically equivalent string.

How NFC works step by step

1. Canonical Decompose:  é  →  e + ◌́  (U+0065, U+0301)
2. Canonical Reorder:    ensure combining marks are in canonical order
3. Canonical Compose:    e + ◌́  →  é  (U+00E9)

The two-step process handles cases where text arrives already partially composed or where combining marks are in an unexpected order. The output is deterministic: the same input always produces the same NFC output.

This tool uses the browser’s native String.prototype.normalize("NFC"), which implements the full Unicode normalisation algorithm from the Unicode standard.

What changes and what does not

Changes under NFC:

Decomposed accented letters merge into their precomposed equivalents (if one exists)
Combining sequences are reordered into the canonical combining order
Code point count often decreases (each combining sequence becomes one code point)

Does not change under NFC:

Compatibility variants like full-width letters, ligatures, superscripts — use NFKC for those
The visible appearance of the text
The semantic meaning

When to apply NFC

Before storing user input in a database: ensures the same name always occupies the same index entry regardless of how the user typed it
Before comparing strings: NFC(a) === NFC(b) is a reliable equality check for identically-appearing text
Before hashing passwords or usernames: avoids the scenario where the same password passes on one device but fails on another with different input method normalization
In HTML content: the W3C recommends NFC for HTML5 documents
For API input validation: normalise before checking length constraints, since NFD can make a string appear longer than its visible character count

Example

Paste e followed by U+0301 (combining acute accent) — which you might copy from a macOS keyboard shortcut or a text processor that decomposes on paste. The NFC output is the single character é (U+00E9). The code point count drops from 2 to 1, and a database comparison now works correctly.

Text already in NFC form (which covers most everyday text on modern systems) passes through unchanged. All normalisation happens in your browser — nothing is uploaded.