Arabic Character Counter

Count Arabic characters with optional diacritic (harakat) exclusion

Count Arabic text characters and UTF-8 bytes, with an option to strip tashkeel (harakat) and tatweel (kashida) before counting so diacritics and stretching don't inflate your totals. Runs in your browser.

What is tashkeel and why exclude it?

Tashkeel (harakat) are the small vowel and pronunciation marks placed above or below Arabic letters, such as fatha, damma, kasra, shadda, and sukun. They are separate Unicode combining characters, so vocalised text has more characters than the same text unvocalised. Excluding them counts the base letters only.

Arabic text can carry tashkeel — the harakat vowel and pronunciation marks like fatha, damma, kasra, shadda, and sukun. Because these are separate Unicode combining characters, the same sentence has a larger character count when it is fully vocalised than when it is written plain. This counter lets you count both ways and also reports UTF-8 bytes, which is what SMS gateways and database columns actually limit.

How it works

Characters are counted with JavaScript’s Unicode-aware string handling. When you enable Exclude tashkeel, the tool removes these code points before counting:

  • Harakat U+064BU+0652: fathatan, dammatan, kasratan, fatha, damma, kasra, shadda, sukun.
  • Superscript alef U+0670 and the Quranic annotation marks U+06D6U+06ED.
  • Tatweel / kashida U+0640, the stretching character used only for justification.

Bytes are computed as the UTF-8 length using TextEncoder, so you see the real on-the-wire size. Most Arabic letters occupy two bytes in UTF-8, so a line of Arabic is typically about twice as many bytes as characters.

Example

The vocalised word:

مُحَمَّدٌ

contains the four base letters م ح م د plus several harakat. With tashkeel included the character count is higher; with Exclude tashkeel enabled it counts as the base letters only. The byte count reflects UTF-8 encoding either way.

Notes

  • Stripping never alters the text in the box — it only changes the count.
  • Use the byte count for SMS segment planning and VARCHAR/NVARCHAR limits, and the diacritic-free character count for word-processing length checks.