Chinese Unique Character Counter

Find all unique Chinese characters in a text and count their occurrences

Extracts every unique CJK character from Chinese text and lists them ranked by frequency, with each character's Unicode code point. Ignores Latin, digits, and punctuation. Useful for vocabulary analysis. Runs in your browser.

Which characters does the tool count?

It keeps only Han characters in the CJK Unified Ideographs block (U+4E00 to U+9FFF) plus Extension A and compatibility ideographs. Latin letters, ASCII digits, and punctuation — including Chinese full-width punctuation — are excluded so only real hanzi are tallied.

Chinese text draws on a large character set, but any given passage uses only a fraction of it — often with a steep frequency curve. This tool extracts every unique Chinese character, ignores the Latin and punctuation around it, and ranks the characters by how often they appear.

How it works

The tool walks the text character by character and keeps only those whose Unicode code point lies in a Han (CJK) block:

U+3400 – U+4DBF   CJK Extension A
U+4E00 – U+9FFF   CJK Unified Ideographs (the common hanzi)
U+F900 – U+FAFF   CJK Compatibility Ideographs

Everything else — Latin letters, digits, ASCII and full-width punctuation — is skipped. The surviving characters are tallied, sorted by count, and each is shown with its U+XXXX code point and its share of all Chinese characters in the text.

Example and tips

The phrase 我爱学习中文 contains 6 characters, all distinct, so the unique count equals the total. In a longer document the gap widens fast: a few characters such as 的 是 不 我 recur constantly while most appear only once. Watch the unique-to-total ratio — a low ratio means the text reuses a tight vocabulary, which usually makes it easier to read.