Hebrew Word Counter

Accurate word count for right-to-left Hebrew text

Count words in Hebrew text with right-to-left handling and an optional mode that treats attached one-letter prefixes (ו ה ב כ ל מ ש) as separate grammatical words. Separates Hebrew and Latin tokens. Free and private.

How are words counted in Hebrew?

By default a word is a whitespace-separated token, the same convention used by most word processors. Leading and trailing punctuation is stripped before counting, so quotation marks, the maqaf, and sentence punctuation do not create phantom words.

This counter gives a reliable word count for Hebrew, handling right-to-left input, stripping punctuation correctly, and offering a grammar-aware mode that counts Hebrew’s attached one-letter prefixes as separate words.

How it works

Text is split on runs of whitespace into tokens, and each token has its leading and trailing punctuation removed (quotation marks, parentheses, the maqaf ־, dashes, and sentence marks). A token is classified as Hebrew if it contains any character in the Hebrew Unicode block (U+0590 to U+05FF); otherwise it counts as a Latin or other-script word. When the prefix mode is on, any Hebrew word that begins with one of the inseparable particles — ו ה ב כ ל מ ש — and is long enough to have a stem is counted as carrying an extra grammatical word:

words = tokens.length
if countPrefixes: words += (tokens beginning with ו/ה/ב/כ/ל/מ/ש)

Example and notes

The phrase הספר בבית is two whitespace tokens. Grammatically it is four words — “the” + “book” + “in” + “house” — because ה and ב are attached particles. With the prefix mode on, the count rises to reflect those attached words; with it off, you get the literal two-token count that matches a standard word processor. Use the plain count for length limits and the prefix-aware count when a teacher or editor counts grammatical words.