Why does Thai have three different counts?

In Thai, vowel signs and tone marks attach above, below, or around a base consonant without taking their own horizontal space. So one visible character can be several Unicode code points, and each code point can be one to three UTF-8 bytes.

What is a grapheme cluster?

A grapheme cluster is one user-perceived character — a base consonant together with its combining vowels and tone marks. The tool groups combining marks (above-vowels, below-vowels, tone marks) with the preceding base to give the visible-character count.

Which count should a character limit use?

It depends on the platform. SMS and many databases count UTF-8 bytes; UI fields often count visible clusters; some APIs count raw code points. Check your target system, then read the matching figure here.

How many bytes is a Thai letter?

Thai characters live in the U+0E00 block, which encodes to three UTF-8 bytes each. So a single visible Thai syllable with a base, a vowel, and a tone mark can be nine UTF-8 bytes.

No. All three counts are computed in your browser using built-in Unicode handling, so nothing is sent to any server.

Thai Character Counter

Email me this result

Get this tool's output sent to your inbox, plus one useful tool a week. No spam, unsubscribe any time.

Thai writing stacks marks — vowels and tone signs sit above, below, or around a base consonant rather than beside it. That makes “how long is this text?” ambiguous, because the number of visible characters, Unicode code points, and stored bytes are all different. This free tool reports all three at once.

How it works

The tool computes three figures from your text:

Code points — the true number of Unicode scalar values, obtained by iterating with the string iterator (which respects surrogate pairs).
UTF-8 bytes — the storage size, measured with TextEncoder. Thai characters in the U+0E00 block each take three bytes.
Grapheme clusters — user-perceived characters, formed by attaching Thai combining marks (above-vowels like ◌ิ, below-vowels like ◌ุ, and tone marks like ◌่) to the preceding base consonant.

Because a single base plus a stacked vowel plus a tone mark is three code points but one visible cluster, the cluster count is usually the lowest of the three for Thai.

Tips and notes

Choose the figure that matches your constraint: bytes for SMS segments and byte-limited database columns, clusters for what the reader actually sees, and code points for raw API length checks. Thai also commonly uses sara am (◌ำ) and the leading vowels เ แ โ ใ ไ, which are written before the consonant but logically follow it — they each remain their own cluster. Everything runs locally in your browser.