What is a grapheme cluster?

A grapheme cluster is one user-perceived character. In Bengali a base letter plus its attached vowel sign, hasanta, or conjunct consonants form a single cluster even though they are several Unicode code points. This count matches what a reader would call one character.

Why is the code-point count higher than the character count?

A conjunct such as ক্ষ is three code points — ক, the hasanta (্), and ষ — but one grapheme cluster. Vowel signs (kar), anusvara, visarga, and chandrabindu are also separate code points that combine with a base letter, so code points always meet or exceed grapheme clusters.

What is the hasanta and why does it matter?

The hasanta (virama, ্, U+09CD) suppresses a consonant's inherent vowel and binds it to the next consonant to form a conjunct (যুক্তাক্ষর). Because it joins two letters into one cluster, ignoring it would overcount Bengali characters.

Does it report bytes too?

Yes. It shows the UTF-8 byte length, which matters for database column limits, SMS, and APIs. Most Bengali code points take three bytes in UTF-8, so the byte count is typically about three times the code-point count.

Does the tool change my text?

No. It only measures. Nothing is uploaded — all counting happens locally in your browser, and the text in the box is never modified.

Bengali Character Counter

Email me this result

Get this tool's output sent to your inbox, plus one useful tool a week. No spam, unsubscribe any time.

Bengali (Bangla) is an abugida where a single perceived character can be several Unicode code points: a base consonant plus a vowel sign, or a conjunct built with the hasanta. Counting code points overcounts what a reader sees. This tool reports both the grapheme clusters (perceived characters) and the raw code points, plus UTF-8 bytes.

How it works

Grapheme clusters are counted with the browser’s Unicode segmentation (Intl.Segmenter) when available, with a hasanta-aware fallback otherwise. The fallback groups characters like this:

A base letter or independent vowel starts a new cluster.
Vowel signs (kar, U+09BE–U+09CC), the hasanta (্, U+09CD), nukta, anusvara, visarga, and chandrabindu all attach to the current cluster.
After a hasanta, the next consonant joins the same cluster as a conjunct.

Bytes are computed as the UTF-8 length, so you see the real on-the-wire size. Most Bengali code points occupy three bytes in UTF-8.

Example

The conjunct ক্ষ is written with three code points — ক, the hasanta ্, and ষ — but is one grapheme cluster. So a word like ক্ষুদ্র reads as far fewer characters than its code-point count suggests, and this tool reports the reader’s count.

Notes

Use the grapheme count for character limits in posts and headlines, the code-point count for low-level processing, and the byte count for VARCHAR limits and SMS planning.
The text in the box is never altered — the tool only measures it.