Chinese Traditional Character Counter

Count Traditional Chinese characters and UTF-8 bytes separately.

Free Traditional Chinese character counter. Counts CJK characters separately from other text and reports exact UTF-8 byte cost (3 bytes per BMP character, 4 for extensions), useful for SMS limits and database sizing. Runs in your browser.

How many bytes does a Chinese character use in UTF-8?

Most Chinese characters live in the Basic Multilingual Plane and take 3 bytes in UTF-8. Rarer characters from CJK Extension B and beyond are encoded as a surrogate pair and take 4 bytes, even though they still count as one character.

The Traditional Chinese Character Counter measures a passage two ways at once: how many characters it contains and how many bytes it occupies in UTF-8. That distinction matters constantly in Traditional-Chinese contexts — for SMS limits, social-post caps, database column sizing and API payload budgets, where a short message can still be surprisingly large in bytes.

How it works

The counter walks through your text character by character. Each character is classified as either a CJK ideograph (using the Unified Ideographs, Extension A and compatibility ranges, plus surrogate-pair extension blocks) or other text such as Latin letters, digits, spaces and punctuation. For the byte figures it uses the browser’s built-in TextEncoder, which encodes the text as UTF-8 exactly the way a server or database would, so the numbers are precise rather than estimated.

The result panel shows the CJK count, the other-character count, the total characters, the UTF-8 byte size of the whole text, the byte size of the CJK characters alone, and the average bytes per CJK character.

Why bytes differ from characters

In UTF-8, a Traditional Chinese character in the Basic Multilingual Plane takes 3 bytes, while a rare character from CJK Extension B and beyond takes 4 bytes — encoded as one surrogate pair but still one character. ASCII letters and punctuation take just 1 byte. So a line like 繁體中文 test is 7 characters but 4×3 + 5×1 = 17 bytes. Knowing both numbers helps you stay within SMS segment limits, size VARCHAR columns correctly and predict payload sizes. All counting runs locally in your browser, so your text stays private and the tool works offline once the page has loaded.