Why is the byte count higher than the character count?

In UTF-8 each Chinese Han character occupies three bytes, while ASCII letters and digits take only one byte each. The byte count is computed with the browser's text encoder, so it reflects exactly how the text is stored on disk or sent over the network.

How are characters counted?

Characters are counted as Unicode code points, so each Han ideograph counts once even though it is three bytes. Rare characters above the basic plane that use surrogate pairs are also counted as a single character rather than two.

How is the word count estimated?

Chinese is written without spaces, so words must be estimated. The tool uses the widely cited ratio of about 1.5 Han characters per word, which gives a reasonable approximation. For exact segmentation, use the dedicated Chinese word counter.

Does it separate Chinese from Latin characters?

Yes. The counter reports how many characters are Han ideographs and how many are other characters such as Latin letters, digits, and punctuation. That lets you measure the genuinely Chinese portion of a bilingual document.

Is my text sent anywhere?

No. All counting runs in your browser with JavaScript, so the text never leaves your device. This makes the tool safe for confidential or unpublished Chinese content.

Chinese Simplified Character Counter

Email me this result

Get this tool's output sent to your inbox, plus one useful tool a week. No spam, unsubscribe any time.

Counting Simplified Chinese text needs more than a naive length check, because a single Han character takes one position on screen but three bytes in storage, and there are no spaces to mark word boundaries. This tool reports characters, UTF-8 bytes, and an estimated word count so you can match whatever limit you are working against.

How it works

Characters are counted as Unicode code points, which means each Han ideograph counts once. The tool classifies a character as Han if it falls in the CJK Unified Ideographs blocks or the compatibility range, and reports the Han total separately from other characters.

The byte count is produced by encoding the text as UTF-8, where each Chinese character is three bytes and each ASCII character is one. The word estimate divides the Han character count by about 1.5, the commonly used average number of characters per Chinese word.

Example and notes

A sentence mixing Chinese, English, and digits will show more bytes than characters because the Chinese portion triples in storage. Use the byte figure when you face a strict database column width or an SMS segment limit, and the character or estimated word figure for editorial length targets. Everything runs locally, so private documents stay on your device.