How does it decide what counts as a word?

After removing tatweel and tashkeel, a word is a run of Arabic or Latin letters and digits separated by whitespace or punctuation. Arabic prefixed particles such as wa (و), fa (ف), bi (ب), and la (ل) are written joined to the following word with no space, so they stay part of that single word.

What is tatweel (kashida) and why remove it?

Tatweel (U+0640, kashida) is a stretching character that elongates the connecting line between letters for justification. It carries no meaning, so the counter removes it before counting so a stretched word is not miscounted or padded.

Are vowel marks (tashkeel) counted as separate words?

No. Tashkeel are combining marks attached to letters, never standalone words. The counter strips them before splitting so vocalised and unvocalised versions of the same text return the same word count.

Does it count characters too?

Yes. The tool reports total characters, characters excluding spaces, words, and sentences, so you can check both word and character limits for posts or abstracts.

How are sentences counted?

Sentences are counted by runs of terminal punctuation, including the Arabic question mark ؟ and full stop, plus . ! ? and the ellipsis …. Consecutive terminators collapse to one boundary.

Arabic Word Counter — Gera Tools

Counting words in Arabic has a few wrinkles that a naive space split gets wrong. Arabic text often contains tatweel (kashida) stretching characters inserted purely for justification, and tashkeel vowel marks attached to letters. Neither should affect the word count. Arabic also writes short particles such as و (and), ف (so), ب (with), and ل (for) joined directly to the next word, so they belong to that single word. This counter handles all of that and reports accurate word, character, and sentence totals.

How it works

The algorithm normalises the text, then splits it:

Tatweel U+0640 and all tashkeel marks (U+064B–U+0652, U+0670, U+06D6–U+06ED) are removed first, so stretching and vowel marks never change the count.
A word is then a maximal run of [\p{L}\p{N}] (Arabic or Latin letters and digits) with an internal hyphen or apostrophe allowed between two such characters.
Prefixed particles like wa/fa/bi/la are written with no space before the stem, so they are naturally counted inside the same word rather than as a boundary.

Characters are counted two ways: every character including spaces, and the length with whitespace removed. Sentences are counted by collapsing runs of terminal punctuation — the Arabic question mark ؟, plus ., !, ?, … — into single boundaries.

Example

The text:

ذهب الطالب إلى المدرسة… وكتب الدرس؟

contains the words ذهب, الطالب, إلى, المدرسة, وكتب, الدرس — six words. The joined particle in وكتب (wa + kataba) is one word, and any tatweel or tashkeel would have been removed before counting.

Notes

Because tashkeel is stripped first, the vocalised and unvocalised versions of the same sentence return the same word count.
Mixed Arabic-English text and Latin product names are counted sensibly because Latin letters are also word characters.

Arabic Word Counter

Email me this result

How it works

Example

Notes