Urdu Character Counter

Count Nastaliq Urdu characters with separate diacritic handling

Count characters in Urdu Nastaliq text with diacritics counted separately and Urdu-only letters (ٹ ڈ ڑ ں ہ) distinguished from shared Arabic-script code points. A precise keyless browser counter.

Why count diacritics separately?

Diacritics (aerab) such as zabar, zer, and pesh are combining marks that sit on a base letter and do not occupy their own visible cell. Counting them apart gives the visible letter count most editors and SMS limits care about.

Urdu is written in the Nastaliq style of the Arabic script but adds its own letters and layers optional diacritics on top of base letters. A naive character count mixes all of these together. This counter breaks the text down into the categories that actually matter.

How it works

The text is iterated one Unicode code point at a time. Each code point is sorted into a bucket:

  • Combining diacritics (harakat / aerab) in ranges like U+064B–U+065F and the superscript alef U+0670 are counted as diacritics.
  • A curated set of Urdu-only letters (ٹ ڈ ڑ ں ہ ھ ے گ پ چ ژ) is counted as Urdu-specific.
  • Any other Arabic-block letter (U+0600–U+06FF) is counted as shared.

The headline figure, characters excluding diacritics, is the total code points minus the diacritics — the count that corresponds to the visible base letters.

Example and notes

For اردو ٹھیک ہے the counter reports the visible letters separately from any aerab you add, and flags ٹ and ہ as Urdu-only. Note that the gol he (ہ) and do-chashmi he (ھ) are distinct code points used for different sounds, so both are treated as Urdu-specific. If you are checking an SMS or username length limit, use the diacritic-excluded count, since most systems measure base characters.