Chinese Simplified Character Counter

Count CJK characters, bytes (UTF-8), and words for Simplified Chinese

Count characters, UTF-8 bytes, and estimated words in Simplified Chinese text. Each Han character is one character but three UTF-8 bytes, and word count uses the common 1.5-characters-per-word ratio. Runs in your browser.

Why is the byte count higher than the character count?

In UTF-8 each Chinese Han character occupies three bytes, while ASCII letters and digits take only one byte each. The byte count is computed with the browser's text encoder, so it reflects exactly how the text is stored on disk or sent over the network.

Counting Simplified Chinese text needs more than a naive length check, because a single Han character takes one position on screen but three bytes in storage, and there are no spaces to mark word boundaries. This tool reports characters, UTF-8 bytes, and an estimated word count so you can match whatever limit you are working against.

How it works

Characters are counted as Unicode code points, which means each Han ideograph counts once. The tool classifies a character as Han if it falls in the CJK Unified Ideographs blocks or the compatibility range, and reports the Han total separately from other characters.

The byte count is produced by encoding the text as UTF-8, where each Chinese character is three bytes and each ASCII character is one. The word estimate divides the Han character count by about 1.5, the commonly used average number of characters per Chinese word.

Example and notes

A sentence mixing Chinese, English, and digits will show more bytes than characters because the Chinese portion triples in storage. Use the byte figure when you face a strict database column width or an SMS segment limit, and the character or estimated word figure for editorial length targets. Everything runs locally, so private documents stay on your device.