This tool converts vowelled or Biblical Hebrew into plain consonantal Hebrew by removing the nikud vowel points, with an option to also strip the te’amim cantillation accents. The result matches the unvowelled text used in most modern Hebrew writing.
How it works
Nikud and te’amim are combining marks layered on top of the consonant letters.
The tool matches them by Unicode code point and deletes them, leaving the base
letters untouched. Nikud occupy U+05B0 to U+05BC plus the shin dot
(U+05C1), sin dot (U+05C2), and qamats qatan (U+05C7). Cantillation accents
occupy U+0591 to U+05AF along with meteg, rafe, and the verse separators.
The replacement is done with two regular expressions:
nikud only -> text.replace(NIKUD_RE, "")
also marks -> result.replace(CANT_RE, "")
Because only combining marks are removed, the consonant skeleton — the part that carries the word’s identity — is fully preserved.
Example and notes
The opening of Genesis, בְּרֵאשִׁית בָּרָא אֱלֹהִים, becomes בראשית ברא אלהים
once the vowel points are stripped — exactly how the same words appear in a
modern newspaper. The counter shows how many marks were removed, which is handy
for confirming a file is fully unvowelled before importing it into a system that
expects plain text. Leave cantillation removal on for liturgical sources, since
those carry both nikud and te’amim.