When you store or transmit Chinese text you usually care about bytes, not characters — because database column limits, SMS segments, and protocol size caps are measured in bytes. This tool encodes your text as real UTF-8 and shows exactly how many bytes it occupies.
How it works
UTF-8 is a variable-width encoding. The number of bytes a character takes depends on its Unicode code point:
U+0000 – U+007F → 1 byte (ASCII: A-Z, 0-9, punctuation)
U+0080 – U+07FF → 2 bytes (Latin accents, Greek, Cyrillic)
U+0800 – U+FFFF → 3 bytes (most CJK: Chinese, Japanese, Korean)
U+10000 and above → 4 bytes (emoji, rare CJK extensions)
The tool runs your text through TextEncoder, the browser’s standards-compliant
UTF-8 encoder, so the byte total is identical to what a server or database would
record. It then groups the bytes by sequence length to show where the weight is.
Example and tips
The four-character phrase 你好世界 (“hello world”) is 4 characters but 12 bytes,
because each Han character costs 3 bytes. Add the English word Hi in front and
you get 7 characters but 15 bytes. If you are designing a byte-limited field,
budget roughly three bytes per Chinese character and remember that a single emoji
can quietly cost four.