What is an ISO 15924 script code?

ISO 15924 assigns each writing system a four-letter code with an initial capital, like Latn for Latin or Cyrl for Cyrillic, plus a three-digit numeric code. These codes label scripts independently of language and are used in BCP 47 language tags and Unicode data.

How do script codes relate to Unicode property aliases?

Unicode's Script property uses long names (Latin, Cyrillic) and short aliases that match the ISO 15924 codes (Latn, Cyrl). In a regex you can write \p{Script=Latin} or \p{Script=Latn}; both select the same set of code points.

What is the script subtag in a language tag?

In BCP 47, the optional script subtag is the ISO 15924 code placed after the language, as in zh-Hant (Traditional Han) or sr-Cyrl (Serbian in Cyrillic). Include it only when it adds information the language subtag does not already imply.

Are Hans and Hant separate scripts?

They are ISO 15924 variants of Han (Hani): Hans is Simplified and Hant is Traditional. Hani is the umbrella code. Use Hans or Hant in language tags to distinguish the two written forms of Chinese.

Does this tool store the glyphs I paste?

No. Filtering runs entirely in your browser. Anything you type or paste stays on your device — nothing is uploaded, logged or stored.

What is the Unicode Script Codes?

Searchable ISO 15924 four-letter script code reference with the numeric code, the Unicode Script property alias and example glyphs for each writing system. It runs free in your browser on Gera Tools, with nothing uploaded.

Unicode Script Codes

Name: Unicode Script Codes
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

ISO 15924 script codes

Every writing system has an ISO 15924 four-letter code — Latn, Cyrl, Arab, Hani — that identifies the script independently of any language. The same short codes double as Unicode Script property aliases, so \p{Script=Latn} works in modern regex engines, and they appear as the script subtag in BCP 47 language tags like zh-Hant. This reference lists common scripts with their code, numeric code, Unicode alias and a sample glyph.

How it works

The code is always four letters, initial-capital then lowercase (Deva, Hang). Each also has a three-digit numeric code (Latin = 215). Unicode pairs the long name with the same short alias:

Script           ISO 15924   Numeric   Sample
Latin            Latn        215       A B c
Cyrillic         Cyrl        220       Ж д
Arabic           Arab        160       ا ب
Han (Trad.)      Hant        502       書
Devanagari       Deva        315       क ख ग
Hebrew           Hebr        125       א ב ג
Georgian         Geor        240       ა ბ გ
Hangul           Hang        286       가 나 다

Use the code as a Unicode property value (\p{Script=Cyrillic}) or as the script subtag in a language tag (sr-Cyrl). The numeric code is mostly for fixed-width legacy systems.

Using script codes in regular expressions

Modern regex engines with Unicode property support allow you to match code points by their script. This is cleaner and more correct than hand-built character classes:

// JavaScript (ES2018+)
/\p{Script=Arabic}/u.test('مرحبا')   // true
/\p{Script=Latn}/u.test('hello')     // true
/\p{Script=Deva}/u.test('नमस्ते')    // true

// Python (via regex module or re with \p)
import regex
regex.fullmatch(r'\p{Script=Cyrl}+', 'Привет')  // matches

Short aliases work too: \p{Script=Latn} and \p{Script=Latin} are equivalent. The advantage over character ranges like [А-я] is that property escapes automatically include the full Unicode block, including letters added in later Unicode versions.

Script codes in BCP 47 language tags

In an IETF BCP 47 language tag, the optional script subtag is placed between the language and region subtags:

zh-Hant-TW   Traditional Chinese used in Taiwan
zh-Hans-CN   Simplified Chinese used in mainland China
sr-Cyrl      Serbian in Cyrillic script
sr-Latn      Serbian in Latin script
uz-Arab      Uzbek in Arabic script (older usage)
uz-Latn      Uzbek in Latin script (current standard in Uzbekistan)

A script subtag is only needed when the language does not already imply a single script. You write en not en-Latn because English is always Latin. But Serbian and Uzbek switch scripts in different contexts, so the subtag is necessary there.

Tips and notes

Codes are case-normalised: Latn is canonical, not LATN or latn.
Zyyy (Common) and Zinh (Inherited) cover punctuation and combining marks shared across scripts — useful when a regex must skip script-neutral glyphs.
Hani is the umbrella for Han; Hans/Hant distinguish Simplified/Traditional in language tags but both are sub-variants of the Hani block.
The Unicode Script_Extensions property handles glyphs used by several scripts (such as digits 0–9, which appear in many scripts); the plain Script property assigns each code point to exactly one script.
Filtering runs entirely in your browser — anything you type or paste stays on your device.