What is the difference between NFC and NFD?

NFC stores each accented Vietnamese letter as a single precomposed code point, while NFD stores a base letter followed by separate combining accent marks. Both display the same when rendered correctly, but they are different byte sequences that compare and search differently.

Why does my Vietnamese text show floating accents?

That usually means the text is in NFD decomposed form and the font or renderer is not stacking the combining marks properly. Converting to NFC merges each letter and accent into one code point, which almost always fixes the display.

Why do the code-point counts differ between forms?

In NFD each accented letter expands into two or three code points, so the total is higher than NFC even though the visible text is identical. Comparing the counts is a quick way to tell which form you have.

Which form should I store data in?

NFC is the safest default for databases, filenames, and web pages, because most software and search engines expect precomposed text. Use NFD only when a specific system, such as some macOS file APIs, requires decomposed form.

Is my text sent to a server?

No. Detection and conversion use your browser's built-in Unicode normalization locally. Nothing you paste is uploaded, so it is safe for private content.

Vietnamese Precomposed vs Decomposed Unicode Normalizer

Get one useful tool a week

Like this tool? Enter your email and we'll send you one genuinely useful Gera tool a week — plus a link to come back to this one. No spam, one-click unsubscribe any time.

This tool diagnoses and fixes Unicode encoding problems in Vietnamese text. Vietnamese diacritics can be stored two ways — precomposed (NFC) or decomposed (NFD) — and a mismatch causes accents to float over letters, search to miss results, and strings that look identical to compare as unequal.

How it works

The tool normalizes your input both ways using your browser’s built-in String.prototype.normalize() function and compares each result against the original:

NFC (precomposed): each accented letter is a single code point. For example, ế is stored as one code point (U+1EBF).
NFD (decomposed): the base letter plus separate combining marks. ế becomes three code points: e (U+0065) + combining circumflex accent (U+0302) + combining acute accent (U+0301).

By comparing code-point counts and the presence of combining marks, the tool identifies whether your text is NFC, NFD, mixed (some characters in each form), or plain ASCII with no diacritics.

Why this matters for Vietnamese specifically

Vietnamese has more distinct diacritical combinations than almost any other Latin- script language. Each vowel can carry a tone mark (level, falling, rising, broken, heavy) and a vowel modifier (circumflex, breve, horn) simultaneously — giving characters like ộ, ướ, and ặ that each decompose into multiple combining marks in NFD. A single paragraph of Vietnamese text may contain dozens of these multi-mark characters, making encoding mismatches highly likely when text moves between systems.

Example

The string Tiếng Việt looks identical in both forms on screen but encodes differently:

NFC: 10 visible characters → 10 code points
NFD: 10 visible characters → more code points (each accented vowel expands)

A database WHERE name = 'Tiếng Việt' can return zero results when the stored string is NFC but the query parameter is NFD, even though both display the same. Normalize to NFC before storing and searching to prevent this class of bug.

When you see floating accents

Floating accents — tone marks that appear disconnected from their vowels — are the most visible symptom of NFD text in a font or renderer that does not correctly stack combining marks. The fix is almost always: convert to NFC. Copy the output NFC string into your document or database and the accents will seat correctly.

Which form to use

Use case	Recommended form
Database storage	NFC
Web page text	NFC
Filenames (most systems)	NFC
macOS HFS+ filenames	NFD (macOS normalises to NFD)
String comparison / search	NFC (normalise both sides)
System that explicitly requires decomposed input	NFD

NFC is the safe default for the vast majority of uses. Nothing is sent to a server; detection and conversion run entirely in your browser.