Arabic Word Frequency Counter

Count word frequencies in Arabic text with optional root grouping

Tallies word frequencies in Arabic text, optionally normalising diacritics and letter variants, and can group inflected forms by a shared three-letter root approximation. Ranks words by count. Runs in your browser.

How are Arabic words counted?

The tool splits the text on spaces, line breaks, and Arabic and Latin punctuation, then tallies identical surface forms. With normalisation on, it first strips short-vowel diacritics and unifies letter variants so that the same word written slightly differently is counted together.

Arabic frequency analysis is complicated by optional vowel marks and by several interchangeable letter forms. This tool counts how often each word appears, optionally normalising those variants so the same word is tallied together, and can cluster inflected forms by an approximate three-letter root.

How it works

Tokens are split on whitespace and on Arabic and Latin punctuation. When normalisation is enabled the tool applies two passes before counting:

diacritics → remove harakat U+064B–U+0652, superscript alef U+0670,
             shadda, tanwin, and tatweel U+0640
letters    → أ إ آ ٱ → ا   ;   ى → ي   ;   ة → ه   ;   ؤ ئ → ء base

For root grouping it strips frequent clitics (the definite article, prepositional and conjunction prefixes, and plural or possessive suffixes) and reduces what remains toward three consonants — an approximation of the Arabic triliteral root.

Example and tips

In a sentence repeating the verb كتب (“he wrote”) alongside الكتاب (“the book”) and كاتب (“writer”), surface-form counting keeps them separate, while root grouping clusters them under an approximate كتب root, revealing that the passage is dominated by the writing theme. Turn normalisation on for messy or mixed-source text; turn it off when you need exact spelling-by-spelling counts.