Thai writing stacks marks — vowels and tone signs sit above, below, or around a base consonant rather than beside it. That makes “how long is this text?” ambiguous, because the number of visible characters, Unicode code points, and stored bytes are all different. This free tool reports all three at once.
How it works
The tool computes three figures from your text:
- Code points — the true number of Unicode scalar values, obtained by iterating with the string iterator (which respects surrogate pairs).
- UTF-8 bytes — the storage size, measured with
TextEncoder. Thai characters in theU+0E00block each take three bytes. - Grapheme clusters — user-perceived characters, formed by attaching Thai combining marks (above-vowels like ◌ิ, below-vowels like ◌ุ, and tone marks like ◌่) to the preceding base consonant.
Because a single base plus a stacked vowel plus a tone mark is three code points but one visible cluster, the cluster count is usually the lowest of the three for Thai.
Tips and notes
Choose the figure that matches your constraint: bytes for SMS segments and byte-limited database columns, clusters for what the reader actually sees, and code points for raw API length checks. Thai also commonly uses sara am (◌ำ) and the leading vowels เ แ โ ใ ไ, which are written before the consonant but logically follow it — they each remain their own cluster. Everything runs locally in your browser.