Catch the words that slipped through
When a document is meant to be fully vocalised — a Quran transcription, a graded reader, a vowelled poem — a single word left bare is easy to miss and jarring to the reader. This checker scans the whole passage and boxes the words that carry too few diacritics, turning a tedious manual proofread into a glance.
How it works
For every Arabic word the tool counts two things: the base consonant letters, and the combining tashkeel marks attached to them. Long-vowel alef forms (ا آ إ أ ى) are subtracted from the expected slots because they normally do not take their own short-vowel mark. It then computes a ratio:
ratio = marks / max(1, letters - alefForms)
Fully vocalised words sit comfortably above 0.5 — roughly one mark for each consonant that can take one. Any word whose ratio falls below that threshold is highlighted in red. Punctuation, digits, and non-Arabic tokens are ignored so they cannot distort the result.
Tips and notes
Because the check is statistical rather than grammatical, use it as a triage tool: it points your eye straight at the suspicious words, and you confirm each one. It pairs naturally with the Harakat Remover — strip a copy to see the consonantal skeleton, then compare against the vocalised original. For very short fragments the threshold can be touchy, so it shines most on full sentences and paragraphs where the pattern of marks is clear.