ISO 15924 script codes
Every writing system has an ISO 15924 four-letter code — Latn, Cyrl,
Arab, Hani — that identifies the script independently of any language. The
same short codes double as Unicode Script property aliases, so \p{Script=Latn}
works in modern regex engines, and they appear as the script subtag in BCP 47
language tags like zh-Hant. This reference lists common scripts with their
code, numeric code, Unicode alias and a sample glyph.
How it works
The code is always four letters, initial-capital then lowercase (Deva,
Hang). Each also has a three-digit numeric code (Latin = 215). Unicode pairs
the long name with the same short alias:
Script ISO 15924 Numeric Sample
Latin Latn 215 A B c
Cyrillic Cyrl 220 Ж д
Arabic Arab 160 ا ب
Han (Trad.) Hant 502 書
Use the code as a Unicode property value (\p{Script=Cyrillic}) or as the
script subtag in a language tag (sr-Cyrl). The numeric code is mostly for
fixed-width legacy systems.
Tips and notes
- Codes are case-normalised:
Latnis canonical, notLATNorlatn. Zyyy(Common) andZinh(Inherited) cover punctuation and combining marks shared across scripts — useful when a regex must skip script-neutral glyphs.Haniis the umbrella for Han;Hans/Hantdistinguish Simplified/Traditional in language tags.- In BCP 47, omit a script subtag when the language already implies it (write
en, noten-Latn). - The Unicode Script_Extensions property handles glyphs used by several scripts; the plain Script property assigns each code point to exactly one script.