How are Hebrew words counted?

The text is split on spaces, line breaks, and punctuation such as the maqaf, geresh, and gershayim, then identical surface forms are tallied. With normalisation enabled, niqqud is removed and final letters are folded so the same word counts together regardless of pointing.

What is niqqud and why strip it?

Niqqud are the small vowel and pronunciation points added beneath and beside Hebrew letters. Modern text is usually unpointed, so a pointed and an unpointed spelling of the same word should count as one. Stripping niqqud merges them.

What does final-letter normalisation do?

Five Hebrew letters take a different sofit (final) shape at the end of a word: kaf, mem, nun, pe, and tsadi. Normalisation converts these final forms to their standard forms so a letter is counted the same wherever it appears.

How reliable is the shoresh grouping?

True shoresh extraction needs a morphological analyser. This tool approximates it by stripping common prefixes (the definite article and the bet, kaf, lamed, mem, shin, vav, he clitics) and suffixes, then reducing toward three root letters. It clusters related words but is not a full analyser.

Does the text leave my browser?

No. Tokenising, niqqud stripping, normalisation, and counting all run locally in your browser. Nothing is uploaded.

What is the Hebrew Word Frequency Counter?

Tallies word frequencies in Hebrew text, optionally stripping niqqud vowel points and normalising final-letter forms, and can group words by an approximate shoresh (three-letter root). Ranks by count. Runs in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Hebrew Word Frequency Counter

Name: Hebrew Word Frequency Counter
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

Hebrew frequency analysis has two wrinkles: optional niqqud vowel points and the five letters that change shape at the end of a word. This tool counts how often each word appears, optionally folding those variants, and can cluster inflected forms by an approximate shoresh (three-letter root).

How it works

Tokens are split on whitespace and Hebrew and Latin punctuation. When normalisation is enabled the tool runs two passes before counting:

niqqud  → remove vowel points and cantillation U+0591–U+05C7
finals  → ך→כ  ם→מ  ן→נ  ף→פ  ץ→צ

For shoresh grouping it strips frequent prefixes — the definite article ה and the clitic letters ב כ ל מ ש ו — and common suffixes, then keeps up to three core letters as an approximation of the triliteral root that underlies most Hebrew vocabulary.

The five sofit (final) letters

Hebrew has five letters that take a distinct “final” form when they appear at the end of a word:

Normal form	Final form	Letter name
כ	ך	Kaf
מ	ם	Mem
נ	ן	Nun
פ	ף	Pe
צ	ץ	Tsadi

In Unicode, each pair occupies different code points (for example, Mem is U+05DE and Final Mem is U+05DD). Without final-letter normalisation, a word appearing mid-sentence would not match the same word appearing at sentence end, causing undercount. Enabling normalisation folds all five pairs before tallying.

What niqqud stripping does

Modern printed Hebrew and most digital text is unpointed. Liturgical texts, children’s books, and learner materials carry niqqud vowel points. When you run frequency analysis on pointed text without stripping nikud, a single word can appear as multiple tokens (e.g., one with full pointing, one partially pointed) that the counter sees as different strings. Stripping niqqud removes all combining marks in the U+0591–U+05C7 range before counting, merging these variants.

The shoresh (root) approximation

Most Hebrew content words derive from a triliteral root — a three-letter base from which related words are built by inserting vowel patterns and adding affixes. For example, the root כ-ת-ב (write) underlies:

כתב — wrote (verb)
מכתב — letter (noun)
כותב — writer / writing (participle)
כתיבה — writing (gerund)
נכתב — was written (passive)

Shoresh grouping strips the most common prefixes (ה, ו, ב, כ, ל, מ, ש) and suffixes, then reduces the stem to three letters. This is an approximation — a proper morphological analyser using a full lexicon would be more accurate — but it meaningfully clusters related forms and reveals the thematic distribution of a text.

Practical uses

Vocabulary analysis: Find the highest-frequency words in a learner text to prioritise which vocabulary to teach or study.
Concordance building: Identify all occurrences and frequency of key words in a biblical passage, sermon, or literary text.
Authorship and style: Compare word distributions across texts or authors.
Translation prep: Know which terms appear most often before starting a translation so you can decide on consistent renderings early.

Strip niqqud when working with pointed liturgical or learner text so it matches everyday unpointed spelling; keep niqqud if your study specifically concerns vowel patterns and pointing variants.