The Arabic ↔ Buckwalter transliterator converts between Arabic script and the Buckwalter ASCII encoding, a strict one-to-one mapping used throughout Arabic computational linguistics. Because every Arabic letter and diacritic corresponds to exactly one ASCII character, the encoding is fully reversible — a property that makes it the standard for storing Arabic in plain-text NLP corpora.
How it works
Each Arabic Unicode code point is paired with one ASCII character. Consonants map to intuitive letters where possible: ب→b, ت→t, د→d, ر→r, س→s, ع→E, ق→q. Letters with no obvious Latin match reuse symbols and punctuation: ء (hamza)→', أ→>, إ→<, ذ→*, ث→v, ح→H, خ→x, ص→S, ض→D, ط→T, ظ→Z, غ→g. Diacritics map too: fatha→a, damma→u, kasra→i, sukun→o, shadda→~, and the tanwin marks→F, N, K.
Since the mapping is a bijection, conversion in either direction is a simple character-by-character substitution. Anything not in the table — Latin letters, digits, spaces, punctuation already in ASCII — passes through unchanged.
Example
The word “كتاب” (book) becomes ktAb. The phrase “العربية” (Arabic) becomes AlErbyp. Converting ktAb back returns “كتاب” exactly. Note that A is the long alef ا while > and < are the hamza-bearing alefs.
Notes
Buckwalter is case-sensitive in the sense that s, S and other upper/lower pairs are entirely different letters — s is س (seen) while S is ص (sad). It is not meant to be readable as English; it is an exact, machine-friendly representation. Everything runs locally — your text is never uploaded.