What counts as a character here?

The tool counts CJK Unified Ideographs — the Hanzi block used by both Traditional and Simplified Chinese. Latin letters, digits, spaces, and punctuation are excluded, so the totals reflect only Chinese characters.

Why is the unique count useful?

The ratio of unique to total characters indicates lexical variety, and the unique list is the exact set you would need to recognise to read the passage. Learners use it to measure vocabulary coverage and to build flashcard decks.

Does it distinguish Traditional from Simplified?

Both forms live in the same Unicode block, so the counter works on either, but it is presented for Traditional text. It does not convert between scripts; it only counts the characters exactly as written.

How does frequency ranking help reading?

Chinese character frequency is highly skewed: a small set of common characters covers most running text. Learning the highest-frequency characters first yields the fastest gains in reading coverage, which the ranked list makes explicit.

What is the Traditional Chinese Unique Character Counter?

Extracts every unique Traditional Chinese (Hanzi) character from a passage and ranks them by frequency, giving total and unique counts for vocabulary-coverage and study planning. Runs entirely in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Traditional Chinese Unique Character Counter

Name: Traditional Chinese Unique Character Counter
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

The Traditional Chinese Unique Character Counter pulls every distinct Hanzi from a passage and ranks them by how often they occur. It reports total and unique character counts, which are the core figures behind vocabulary coverage and study planning.

How it works

The tool scans the text codepoint by codepoint and keeps only CJK Unified Ideographs, tallying each into a frequency map:

total chars   = every Hanzi occurrence
unique chars  = number of distinct Hanzi
coverage rank = characters sorted by descending count

Because Chinese character frequency is steeply skewed, the top of the ranked list accounts for a large share of the running text. The unique-to-total ratio then summarises how varied the vocabulary is: a low ratio means heavy repetition, a high ratio means dense, diverse vocabulary.

Traditional vs Simplified — why a dedicated counter matters

Although both Traditional and Simplified Chinese use the same Unicode CJK block (U+4E00–U+9FFF) and the counter works on both, they differ in two important ways for analysis:

Character forms: Many characters differ substantially between scripts — 國 (Traditional) vs 国 (Simplified), 書 vs 书. A frequency analysis of a Traditional Chinese text should not be compared to a Simplified Chinese frequency list, and this tool counts characters as they appear without any conversion.
Character coverage targets: If you are studying to read Traditional Chinese newspapers (Taiwan, Hong Kong) or classical literature, the highest-frequency characters in those corpora differ modestly from the most common characters in modern Simplified Chinese text. Running this counter on a passage from your target reading material gives you a corpus-specific frequency list, not a generic one.

Using the frequency list for vocabulary building

The ranked output is most powerful when you overlay it with what you already know:

Go through the list from the top and mark each character you can already read.
The highest-frequency unknowns immediately below your knowledge threshold are your best next study targets — they appear frequently enough that learning them will noticeably improve your reading comprehension of this material.
Characters that appear only once in a long text are low-priority unless they are recurring terms in your field.

This approach is sometimes called text-targeted vocabulary learning and produces faster comprehension gains than studying general-purpose frequency lists, especially for specialised domains like legal, medical, or classical literature.

The coverage curve in Traditional Chinese text

Like all written languages, Traditional Chinese follows a Zipf-like frequency distribution: a very small number of characters account for a large share of running text. In typical modern Traditional Chinese prose:

The most frequent ~100 characters cover roughly 40–50% of running text occurrences.
The most frequent ~500 characters cover roughly 75–80%.
The most frequent ~1,000 characters cover roughly 90%.

This is why reading coverage improves rapidly in the early stages of Chinese study and then slows as you push into the long tail of low-frequency characters. The unique list from your specific text shows you exactly where the frontier lies for that material.