Why does ộ count as one character?

A reader sees ộ as a single letter, so it should count as one grapheme. Internally it may be the letter o plus a circumflex modifier plus a below-dot, which is three code points, but grapheme counting groups those into the one character a person perceives.

What is the difference between graphemes, code points, and bytes?

Graphemes are the characters a person sees. Code points are the individual Unicode values that make them up. Bytes are how many bytes those code points occupy in UTF-8. Vietnamese text can differ on all three because of stacked diacritics.

Does Vietnamese always use stacked diacritics?

Vietnamese can be stored two ways. Precomposed (NFC) text often uses single code points for accented letters, while decomposed (NFD) text spells them out as base plus combining marks. Either way this tool counts the visible grapheme as one character.

Why do bytes matter for SMS or databases?

SMS, varchar columns, and API limits are frequently measured in bytes or code units rather than visible characters. A short-looking Vietnamese string can exceed a byte limit because each accented letter may take several UTF-8 bytes, so the byte total helps you stay within limits.

Does it count spaces and punctuation?

Yes. Spaces, punctuation, and line breaks are all graphemes and are included in the character count. The tool also reports a separate count with whitespace removed so you can see the non-space length when you need it.

Vietnamese Character Counter

Email me this result

Get this tool's output sent to your inbox, plus one useful tool a week. No spam, unsubscribe any time.

Vietnamese stacks tone marks and vowel modifiers onto base letters, producing characters like ộ, ự, and ằ. A naive length check can count one of these as several characters, which is wrong for anything a human reads. This counter measures user-perceived characters — graphemes — alongside code points and bytes.

How it works

The tool uses Unicode grapheme segmentation to group each visible character, including a base letter plus all its combining marks, into one unit:

ộ  = o + ◌̂ (circumflex) + ◌̣ (below dot)  → 3 code points → 1 grapheme

It reports three figures: graphemes (what a reader counts), code points (individual Unicode values), and UTF-8 bytes (storage size). For Vietnamese these often differ, especially when text is stored in decomposed (NFD) form rather than precomposed (NFC).

Example and tips

The word một (“one”) is three graphemes (m, ộ, t) even if it is stored as five code points. Use the grapheme count for word-length and display purposes, the byte count for SMS segments and database varchar limits, and the code-point count when debugging encoding. If two seemingly identical strings count differently, one is probably NFC and the other NFD — normalize before comparing.