Vietnamese Precomposed vs Decomposed Unicode Normalizer

Normalize Vietnamese text between NFC precomposed and NFD decomposed forms

Detect whether Vietnamese text is encoded as NFC precomposed or NFD decomposed Unicode, then convert between them to fix floating-accent display bugs and inconsistent search. Shows code-point counts for each form. Runs in your browser.

What is the difference between NFC and NFD?

NFC stores each accented Vietnamese letter as a single precomposed code point, while NFD stores a base letter followed by separate combining accent marks. Both display the same when rendered correctly, but they are different byte sequences that compare and search differently.

This tool diagnoses and fixes Unicode encoding problems in Vietnamese text. Vietnamese diacritics can be stored two ways, and a mismatch causes accents to float over letters, search to miss results, and duplicate-looking strings to compare as unequal.

How it works

The tool normalizes your input both ways and compares it against the original to detect its current form. NFC (precomposed) packs each accented letter into a single code point, such as ế at U+1EBF. NFD (decomposed) stores a plain e followed by combining circumflex and acute marks. By counting code points and combining marks, the tool can tell NFC from NFD from mixed text, then offers both converted forms to copy.

Example and notes

The string Tiếng Việt looks identical in both forms but takes more code points in NFD because each accented vowel splits into a base letter plus marks. If you paste text and the detected form is NFD or mixed, convert it to NFC for storage and display — that is what databases, URLs, and most fonts expect. Reach for NFD only when a specific system demands decomposed input.