Arabic Syllable Structure Analyzer

Analyze CV/CVC/CVVC syllable patterns in Arabic text

Break vocalised Arabic text into syllables and classify each as CV, CVV, CVC, CVVC or CVCC — light, heavy or superheavy — using consonant, short-vowel, long-vowel and sukun parsing for linguistics work. Runs in your browser.

What syllable types does Arabic have?

Arabic syllables always begin with one consonant and a vowel. The core types are CV and CVV (open) and CVC, CVVC and CVCC (closed). CV is light, CVC and CVV are heavy, and CVVC or CVCC are superheavy and usually appear word-finally.

Arabic has a tightly constrained syllable structure: every syllable opens with a single consonant and a vowel, and may be closed by a coda consonant. This analyzer parses fully vocalised Arabic into those syllables and labels each one by its CV pattern and weight.

How it works

The text is segmented character by character into consonants (C), short vowels (V), long vowels (VV), and codas:

  • A consonant letter becomes an onset C.
  • A short vowel (fatha, damma, kasra) is a V nucleus — unless it is followed by its matching madd letter (alef, waw, ya), in which case the pair is a long VV nucleus.
  • A sukun marks the preceding consonant as a coda, closing the syllable.
  • A shadda (gemination) splits into a coda on the previous syllable plus an onset on the next.
  • Tanwin is read as a short vowel plus a final /n/ coda.

Syllables are then built greedily as C + nucleus + optional coda. Weight is classified as light (CV), heavy (CVC, CVV), or superheavy (CVVC, CVCC).

Example

The verb:

كَتَبَ  (kataba)

parses as CV · CV · CV — three light open syllables. The word:

دَرْسَهُ  (darsahu)

parses as CVC · CV · CV, where the sukun on the rāʾ closes the first syllable into a heavy CVC.

Notes

  • Always vocalise the text; without harakat the vowels are invisible to any syllabifier.
  • CVVC and CVCC superheavy syllables normally occur only at the end of a word.
  • This is a phonological approximation for linguistics and teaching, not a complete prosodic engine.