Hebrew frequency analysis has two wrinkles: optional niqqud vowel points and the five letters that change shape at the end of a word. This tool counts how often each word appears, optionally folding those variants, and can cluster inflected forms by an approximate shoresh (three-letter root).
How it works
Tokens are split on whitespace and Hebrew and Latin punctuation. When normalisation is enabled the tool runs two passes before counting:
niqqud → remove vowel points and cantillation U+0591–U+05C7
finals → ך→כ ם→מ ן→נ ף→פ ץ→צ
For shoresh grouping it strips frequent prefixes — the definite article ה and
the clitic letters ב כ ל מ ש ו — and common suffixes, then keeps up to three
core letters as an approximation of the triliteral root that underlies most
Hebrew vocabulary.
Example and tips
Words built on the root כתב (“write”) — such as כתב, מכתב (“letter”), and
כותב (“writer”) — are counted separately by surface form, but shoresh grouping
clusters them under an approximate כתב root, exposing the writing theme of the
passage. Strip niqqud when working with pointed liturgical or learner text so it
matches everyday unpointed spelling; leave it on by default for modern prose.