What is Buckwalter transliteration?

Buckwalter transliteration is a strict one-to-one mapping between Arabic letters (including diacritics) and ASCII characters, devised by Tim Buckwalter. It is widely used in Arabic natural-language processing because every Arabic glyph maps to exactly one ASCII symbol with no ambiguity.

Why does it use symbols like > and < and *?

Because the scheme is strictly one-to-one, it ran out of plain letters and reuses punctuation. For example > is alef with hamza above, < is alef with hamza below, and * is the letter thal. This keeps the mapping reversible without diacritics.

Does it handle short vowels and tanwin?

Yes. The full Buckwalter table includes the diacritic marks: fatha=a, damma=u, kasra=i, sukun=o, shadda=~, and the tanwin marks F, N, K. Vocalised text round-trips exactly.

Is the conversion reversible?

Yes — Buckwalter was designed as a bijection, so converting Arabic to Buckwalter and back returns the original text exactly. This tool keeps the mapping strictly one-to-one in both directions.

No. The conversion uses a local lookup table in your browser, so nothing is sent to a server.

What is the Arabic ↔ Buckwalter Transliterator?

Free Buckwalter transliterator — convert Arabic script to the strict 1-to-1 Buckwalter ASCII scheme and back, instantly in your browser. No upload, no account. It runs free in your browser on Gera Tools, with nothing uploaded.

Arabic ↔ Buckwalter Transliterator

Name: Arabic ↔ Buckwalter Transliterator
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

The Arabic ↔ Buckwalter transliterator converts between Arabic script and the Buckwalter ASCII encoding, a strict one-to-one mapping used throughout Arabic computational linguistics. Because every Arabic letter and diacritic corresponds to exactly one ASCII character, the encoding is fully reversible — a property that makes it the standard for storing Arabic in plain-text NLP corpora.

Why Buckwalter exists

Before Unicode support was widespread, researchers needed a way to store Arabic text in plain-text environments — databases, spreadsheets, early computational tools — that could only handle 7-bit ASCII. Tim Buckwalter’s scheme, introduced while he was at Xerox PARC and later refined at the Linguistic Data Consortium (LDC), solved that by mapping every Arabic Unicode code point to a unique printable ASCII character. The result is compact, unambiguous, and reversible, which is why it remains the internal format for many Arabic NLP corpora, treebanks, and morphological analysers decades later.

How it works

Each Arabic Unicode code point is paired with one ASCII character. Consonants map to intuitive letters where possible: ب→b, ت→t, د→d, ر→r, س→s, ع→E, ق→q. Letters with no obvious Latin match reuse symbols and punctuation: ء (hamza)→', أ→>, إ→<, ذ→*, ث→v, ح→H, خ→x, ص→S, ض→D, ط→T, ظ→Z, غ→g. Diacritics map too: fatha→a, damma→u, kasra→i, sukun→o, shadda→~, and the tanwin marks→F, N, K.

Since the mapping is a bijection, conversion in either direction is a simple character-by-character substitution. Anything not in the table — Latin letters, digits, spaces, punctuation already in ASCII — passes through unchanged.

Worked examples

The word كتاب (book) becomes ktAb. Breaking it down: k=ك, t=ت, A=ا (long alef), b=ب. The phrase العربية (Arabic, feminine adjective) becomes AlErbyp: A=ا, l=ل, E=ع, r=ر, b=ب, y=ي, p=ة (ta marbuta). Converting ktAb back returns كتاب exactly — illustrating the bijection.

A vocalised form such as كَتَبَ (he wrote) in full short-vowel markup becomes kataba: k=ك, a=fatha, t=ت, a=fatha, b=ب, a=fatha. That six-character ASCII string round-trips back to the diacritically-marked Arabic character-for-character.

Buckwalter quick-reference table

Arabic	Buckwalter	Common name
ا	A	alef
ب	b	ba
ت	t	ta
ث	v	tha
ج	j	jeem
ح	H	ha (pharyngeal)
خ	x	kha
د	d	dal
ذ	*	thal
ر	r	ra
ز	z	zayn
س	s	seen
ش	$	sheen
ص	S	sad
ض	D	dad
ط	T	tah
ظ	Z	thah
ع	E	ayn
غ	g	ghain
ف	f	fa
ق	q	qaf
ك	k	kaf
ل	l	lam
م	m	meem
ن	n	noon
ه	h	ha
و	w	waw
ي	y	ya
ة	p	ta marbuta
أ	`>`	alef with hamza above
إ	`<`	alef with hamza below
آ	`\|`	alef madda
ء	’	hamza
ـ	_	tatweel

Diacritics: fatha a · damma u · kasra i · sukun o · shadda ~ · tanwin fath F · tanwin damm N · tanwin kasr K.

Practical uses

NLP preprocessing: feeding Arabic into morphological analysers, POS taggers, and parsers that expect Buckwalter-encoded input (for example the LDC Arabic Treebank or the MADA+TOKAN toolkit).
Spreadsheet work: storing vocalised Arabic in cells without encoding headaches.
Corpus annotation: annotators working in ASCII-only tools can mark up Arabic then convert back.
Regex searching: Buckwalter-encoded text can be searched with standard ASCII regex patterns, which is far simpler than writing Unicode-aware Arabic patterns.

Important caveats

Buckwalter is case-sensitive and nothing like ALA-LC, Hans Wehr, or other transliteration schemes — s is س (seen) while S is ص (sad), and h is ه while H is ح. Do not mix it with phonemic romanization systems. It is not meant to be readable as English; it is an exact, machine-friendly representation. Everything runs locally — your text is never uploaded.