BCP 47 Language Tag Reference

IETF BCP 47 language tag syntax with subtag types and IANA registry subtags.

Reference for IETF BCP 47 language tags: primary language, script, region, variant and extension subtags, their syntax rules, ordering, and worked examples like zh-Hans-CN and en-US-u-ca-gregory.

What is the correct order of subtags in a BCP 47 language tag?

The canonical order is language-Script-REGION-variant-extension-privateuse, for example zh-Hans-CN or de-1996. Language is lowercase, script is Title-case, region is UPPERCASE, and variants/extensions follow.

BCP 47 language tag reference

A BCP 47 language tag identifies a human language together with optional script, region, variant and extension information. It is the value used in HTML lang attributes, hreflang, HTTP Accept-Language, and locale APIs. Tags are built by joining registered subtags with hyphens, drawing on ISO 639 (language), ISO 15924 (script), ISO 3166-1 / UN M.49 (region) and the IANA Language Subtag Registry.

How it works

A well-formed tag follows a fixed order, each part optional except the primary language:

language - Script - REGION - variant - extension - privateuse
   en       Latn      US      1996      u-ca-...     x-foo

Casing is conventional but recommended: language subtags are lowercase (en), script subtags are Title-case 4 letters (Latn), region subtags are UPPERCASE 2 letters or 3 digits (GB, 419), and variants are lowercase. A subtag’s class is determined by its shape: 2-3 letters at the front is the language, exactly 4 letters is a script, 2 letters or 3 digits after that is a region, and 5-8 alphanumerics (or 4 starting with a digit) is a variant.

Extensions begin with a single-letter singleton: u for Unicode locale keywords (calendar, numbering, collation), t for transformed content, and x for private use (everything after x- is opaque). Matching uses these subtags hierarchically — en-GB falls back to en when no exact match exists.

Tips and examples

  • Do not over-specify. en is usually better than en-Latn-US-x-mine unless each subtag is genuinely needed; redundant scripts hurt content negotiation.
  • For Chinese, choose script over region for written content: zh-Hans (simplified) and zh-Hant (traditional) are more portable than zh-CN / zh-TW.
  • Use the UN M.49 numeric region 419 for “Latin America” when no single country applies, e.g. es-419.
  • Validate tags against the IANA Language Subtag Registry; only registered subtags are well-formed, with x- private use as the escape hatch for unregistered values.