Thai Character Counter

Count Thai characters, bytes, and user-perceived clusters separately

Free Thai character counter — report Unicode code points, UTF-8 bytes, and user-perceived grapheme clusters for Thai text, where vowels and tone marks stack on consonants, all in your browser.

Why does Thai have three different counts?

In Thai, vowel signs and tone marks attach above, below, or around a base consonant without taking their own horizontal space. So one visible character can be several Unicode code points, and each code point can be one to three UTF-8 bytes.

Thai writing stacks marks — vowels and tone signs sit above, below, or around a base consonant rather than beside it. That makes “how long is this text?” ambiguous, because the number of visible characters, Unicode code points, and stored bytes are all different. This free tool reports all three at once.

How it works

The tool computes three figures from your text:

  • Code points — the true number of Unicode scalar values, obtained by iterating with the string iterator (which respects surrogate pairs).
  • UTF-8 bytes — the storage size, measured with TextEncoder. Thai characters in the U+0E00 block each take three bytes.
  • Grapheme clusters — user-perceived characters, formed by attaching Thai combining marks (above-vowels like ◌ิ, below-vowels like ◌ุ, and tone marks like ◌่) to the preceding base consonant.

Because a single base plus a stacked vowel plus a tone mark is three code points but one visible cluster, the cluster count is usually the lowest of the three for Thai.

Tips and notes

Choose the figure that matches your constraint: bytes for SMS segments and byte-limited database columns, clusters for what the reader actually sees, and code points for raw API length checks. Thai also commonly uses sara am (◌ำ) and the leading vowels เ แ โ ใ ไ, which are written before the consonant but logically follow it — they each remain their own cluster. Everything runs locally in your browser.