Unicode General Categories

All Unicode general category codes (Lu, Ll, Nd…) with names and character counts.

Searchable Unicode general category reference with two-letter code, full name and example characters. Look up Lu, Ll, Nd, Po and every other category, plus the regex property that matches it.

What is a Unicode general category?

Every Unicode code point is assigned exactly one general category — a two-letter code such as Lu (uppercase letter) or Nd (decimal digit). The first letter names the broad group (L letters, N numbers, P punctuation, and so on) and the second narrows it down.

Look up any Unicode category

The Unicode standard tags every code point with a general category — a compact two-letter code that says what kind of character it is. Lu is an uppercase letter, Nd a decimal digit, Po other punctuation, Zs a space separator. The first letter is the major class (Letter, Mark, Number, Punctuation, Symbol, Separator, Other) and the second letter refines it. This reference lists all 30 categories with names, groups, examples and the regex property that matches them.

How it works

The general category is a fixed property in the Unicode Character Database. When you ask a regex engine for \p{Lu}, it consults this same classification and matches every code point whose category is Lu. Major-class escapes work too: \p{L} matches Lu, Ll, Lt, Lm and Lo together. The categories are mutually exclusive — a character belongs to exactly one — which is why they are reliable building blocks for tokenisers, validators and text filters.

Tips and notes

When validating “letters and digits”, prefer \p{L} and \p{Nd} over the ASCII-only [A-Za-z0-9] so you accept international text. Strip layout and control noise by excluding \p{C} (the Other group) and the separator categories. Remember that Nd only covers decimal digits — Roman numerals are Nl and fractions are No, so a “numbers” filter using only Nd will miss them. The surrogate (Cs) and unassigned (Cn) categories should normally never appear in well-formed text.