Bengali Character Counter

Count Bengali grapheme clusters vs raw Unicode code points

Count Bengali (Bangla) text by user-perceived grapheme clusters as well as raw Unicode code points and UTF-8 bytes. Conjuncts joined by the hasanta and dependent vowel signs are counted as one character, not several. Runs in your browser.

What is a grapheme cluster?

A grapheme cluster is one user-perceived character. In Bengali a base letter plus its attached vowel sign, hasanta, or conjunct consonants form a single cluster even though they are several Unicode code points. This count matches what a reader would call one character.

Bengali (Bangla) is an abugida where a single perceived character can be several Unicode code points: a base consonant plus a vowel sign, or a conjunct built with the hasanta. Counting code points overcounts what a reader sees. This tool reports both the grapheme clusters (perceived characters) and the raw code points, plus UTF-8 bytes.

How it works

Grapheme clusters are counted with the browser’s Unicode segmentation (Intl.Segmenter) when available, with a hasanta-aware fallback otherwise. The fallback groups characters like this:

  • A base letter or independent vowel starts a new cluster.
  • Vowel signs (kar, U+09BEU+09CC), the hasanta (, U+09CD), nukta, anusvara, visarga, and chandrabindu all attach to the current cluster.
  • After a hasanta, the next consonant joins the same cluster as a conjunct.

Bytes are computed as the UTF-8 length, so you see the real on-the-wire size. Most Bengali code points occupy three bytes in UTF-8.

Example

The conjunct ক্ষ is written with three code points — , the hasanta , and — but is one grapheme cluster. So a word like ক্ষুদ্র reads as far fewer characters than its code-point count suggests, and this tool reports the reader’s count.

Notes

  • Use the grapheme count for character limits in posts and headlines, the code-point count for low-level processing, and the byte count for VARCHAR limits and SMS planning.
  • The text in the box is never altered — the tool only measures it.