BCP 47 language tag reference
A BCP 47 language tag identifies a human language together with optional script, region, variant and extension information. It is the value used in HTML lang attributes, hreflang, HTTP Accept-Language, and locale APIs. Tags are built by joining registered subtags with hyphens, drawing on ISO 639 (language), ISO 15924 (script), ISO 3166-1 / UN M.49 (region) and the IANA Language Subtag Registry.
How it works
A well-formed tag follows a fixed order, each part optional except the primary language:
language - Script - REGION - variant - extension - privateuse
en Latn US 1996 u-ca-... x-foo
Casing is conventional but recommended: language subtags are lowercase (en), script subtags are Title-case 4 letters (Latn), region subtags are UPPERCASE 2 letters or 3 digits (GB, 419), and variants are lowercase. A subtag’s class is determined by its shape: 2-3 letters at the front is the language, exactly 4 letters is a script, 2 letters or 3 digits after that is a region, and 5-8 alphanumerics (or 4 starting with a digit) is a variant.
Extensions begin with a single-letter singleton: u for Unicode locale keywords (calendar, numbering, collation), t for transformed content, and x for private use (everything after x- is opaque). Matching uses these subtags hierarchically — en-GB falls back to en when no exact match exists.
Tips and examples
- Do not over-specify.
enis usually better thanen-Latn-US-x-mineunless each subtag is genuinely needed; redundant scripts hurt content negotiation. - For Chinese, choose script over region for written content:
zh-Hans(simplified) andzh-Hant(traditional) are more portable thanzh-CN/zh-TW. - Use the UN M.49 numeric region
419for “Latin America” when no single country applies, e.g.es-419. - Validate tags against the IANA Language Subtag Registry; only registered subtags are well-formed, with
x-private use as the escape hatch for unregistered values.