Hebrew Word Frequency Counter

Count word frequencies in Hebrew text with optional shoresh root grouping

Tallies word frequencies in Hebrew text, optionally stripping niqqud vowel points and normalising final-letter forms, and can group words by an approximate shoresh (three-letter root). Ranks by count. Runs in your browser.

How are Hebrew words counted?

The text is split on spaces, line breaks, and punctuation such as the maqaf, geresh, and gershayim, then identical surface forms are tallied. With normalisation enabled, niqqud is removed and final letters are folded so the same word counts together regardless of pointing.

Hebrew frequency analysis has two wrinkles: optional niqqud vowel points and the five letters that change shape at the end of a word. This tool counts how often each word appears, optionally folding those variants, and can cluster inflected forms by an approximate shoresh (three-letter root).

How it works

Tokens are split on whitespace and Hebrew and Latin punctuation. When normalisation is enabled the tool runs two passes before counting:

niqqud  → remove vowel points and cantillation U+0591–U+05C7
finals  → ך→כ  ם→מ  ן→נ  ף→פ  ץ→צ

For shoresh grouping it strips frequent prefixes — the definite article ה and the clitic letters ב כ ל מ ש ו — and common suffixes, then keeps up to three core letters as an approximation of the triliteral root that underlies most Hebrew vocabulary.

Example and tips

Words built on the root כתב (“write”) — such as כתב, מכתב (“letter”), and כותב (“writer”) — are counted separately by surface form, but shoresh grouping clusters them under an approximate כתב root, exposing the writing theme of the passage. Strip niqqud when working with pointed liturgical or learner text so it matches everyday unpointed spelling; leave it on by default for modern prose.