Mystery characters in text — an invisible control byte, a look-alike Cyrillic letter, or an emoji that breaks a database column — are easy to misread. This inspector breaks any string into its individual Unicode code points and shows the full identity of each one.
How it works
The tool iterates the string by code point rather than by UTF-16 unit, so emoji and other astral characters are treated as single characters. For each one it reports:
- the code point in
U+XXXXnotation viacodePointAt, - the general category (such as
Lu,Nd, orSo), derived from the browser’s Unicode property escapes like\p{Lu}, - the Unicode block, matched against the standard range table,
- the UTF-8 bytes, computed directly from the code point, and the UTF-16 units that make up the JavaScript string.
Tips and notes
Use the UTF-8 column to debug encoding bugs: a character that should be one
byte but shows up as several often means text was double-encoded. The category
column helps when writing regular expressions, since \p{Nd} matches any
decimal digit across scripts, not just 0-9. Watch for control characters
(category Cc), which display as a ctrl marker here because they have no
visible glyph but can still corrupt files and break parsers.