Look up any Unicode category
The Unicode standard tags every code point with a general category — a compact two-letter code that says what kind of character it is. Lu is an uppercase letter, Nd a decimal digit, Po other punctuation, Zs a space separator. The first letter is the major class (Letter, Mark, Number, Punctuation, Symbol, Separator, Other) and the second letter refines it. This reference lists all 30 categories with names, groups, examples and the regex property that matches them.
How it works
The general category is a fixed property in the Unicode Character Database. When you ask a regex engine for \p{Lu}, it consults this same classification and matches every code point whose category is Lu. Major-class escapes work too: \p{L} matches Lu, Ll, Lt, Lm and Lo together. The categories are mutually exclusive — a character belongs to exactly one — which is why they are reliable building blocks for tokenisers, validators and text filters.
Tips and notes
When validating “letters and digits”, prefer \p{L} and \p{Nd} over the ASCII-only [A-Za-z0-9] so you accept international text. Strip layout and control noise by excluding \p{C} (the Other group) and the separator categories. Remember that Nd only covers decimal digits — Roman numerals are Nl and fractions are No, so a “numbers” filter using only Nd will miss them. The surrogate (Cs) and unassigned (Cn) categories should normally never appear in well-formed text.