This counter gives a reliable word count for Hebrew, handling right-to-left input, stripping punctuation correctly, and offering a grammar-aware mode that counts Hebrew’s attached one-letter prefixes as separate words.
How it works
Text is split on runs of whitespace into tokens, and each token has its leading
and trailing punctuation removed (quotation marks, parentheses, the maqaf ־,
dashes, and sentence marks). A token is classified as Hebrew if it contains any
character in the Hebrew Unicode block (U+0590 to U+05FF); otherwise it counts
as a Latin or other-script word. When the prefix mode is on, any Hebrew word that
begins with one of the inseparable particles — ו ה ב כ ל מ ש — and is long
enough to have a stem is counted as carrying an extra grammatical word:
words = tokens.length
if countPrefixes: words += (tokens beginning with ו/ה/ב/כ/ל/מ/ש)
Example and notes
The phrase הספר בבית is two whitespace tokens. Grammatically it is four words —
“the” + “book” + “in” + “house” — because ה and ב are attached particles.
With the prefix mode on, the count rises to reflect those attached words; with it
off, you get the literal two-token count that matches a standard word processor.
Use the plain count for length limits and the prefix-aware count when a teacher
or editor counts grammatical words.