Urdu Word Counter

Accurate word count for Urdu Nastaliq prose and poetry

Count words in Urdu Nastaliq text accurately by treating the zero-width non-joiner that separates compound-word parts as a single word rather than a break. Also counts sentences and characters. Keyless browser tool.

How does it handle compound words?

Urdu compound words like کتاب‌خانہ (library) are written with a zero-width non-joiner between the parts so the ligatures stay separate without a visible space. The counter removes that joiner before splitting, so the compound counts as one word, not two.

Counting Urdu words with a generic word counter gives wrong results because of one quirk: compound words are stitched together with an invisible joiner rather than a space. This counter understands that convention and counts the way a reader actually sees the words.

How it works

Before counting, the text has every zero-width non-joiner (U+200C) stripped, so a compound such as کتاب‌خانہ collapses to a single token instead of splitting at the joiner. The cleaned text is then split on a class of separators:

whitespace  +  ۔ ، ؛ ؟  +  . , ; : ! ? ( )  +  quotes

Empty tokens from consecutive separators are discarded. Sentences are counted by splitting on the Urdu full stop ۔ plus question and exclamation marks. Character counts are reported with and without whitespace.

Example and notes

The line یہ ایک کتاب‌خانہ ہے۔ آپ کیسے ہیں؟ counts as five words and two sentences — the ZWNJ inside کتاب‌خانہ does not inflate the total. Note that the zero-width non-joiner is invisible, so two pieces of text that look identical can have different naive word counts; this tool normalises that difference away.