Bengali (Bangla) is an abugida where a single perceived character can be several Unicode code points: a base consonant plus a vowel sign, or a conjunct built with the hasanta. Counting code points overcounts what a reader sees. This tool reports both the grapheme clusters (perceived characters) and the raw code points, plus UTF-8 bytes.
How it works
Grapheme clusters are counted with the browser’s Unicode segmentation
(Intl.Segmenter) when available, with a hasanta-aware fallback otherwise. The
fallback groups characters like this:
- A base letter or independent vowel starts a new cluster.
- Vowel signs (kar,
U+09BE–U+09CC), the hasanta (্,U+09CD), nukta, anusvara, visarga, and chandrabindu all attach to the current cluster. - After a hasanta, the next consonant joins the same cluster as a conjunct.
Bytes are computed as the UTF-8 length, so you see the real on-the-wire size. Most Bengali code points occupy three bytes in UTF-8.
Example
The conjunct ক্ষ is written with three code points — ক, the hasanta ্, and
ষ — but is one grapheme cluster. So a word like ক্ষুদ্র reads as far fewer
characters than its code-point count suggests, and this tool reports the
reader’s count.
Notes
- Use the grapheme count for character limits in posts and headlines, the
code-point count for low-level processing, and the byte count for
VARCHARlimits and SMS planning. - The text in the box is never altered — the tool only measures it.