Traditional Chinese — used in Taiwan, Hong Kong, and Macau — costs the same three bytes per Han character in UTF-8 as Simplified Chinese. This tool encodes your text with the browser’s real UTF-8 encoder so you can size database columns, SMS segments, and JSON payloads with confidence.
How it works
UTF-8 encodes each character in one to four bytes depending on its Unicode code point:
U+0000 – U+007F → 1 byte (ASCII letters, digits)
U+0080 – U+07FF → 2 bytes (Latin accents, Greek, Cyrillic)
U+0800 – U+FFFF → 3 bytes (Traditional Han, CJK)
U+10000 and above → 4 bytes (rare HK supplementary chars, emoji)
The tool feeds your text through TextEncoder for an exact byte total, then
groups characters into bands so you can see where the weight comes from.
Example and tips
The phrase 臺灣 (“Taiwan”) is 2 characters but 6 bytes. Add the English word
Taipei and you get 9 characters but 13 bytes, because the seven ASCII
characters cost 1 byte each. When you migrate a legacy Big5 dataset to UTF-8,
expect Traditional text to grow by roughly 50 percent in byte size, so re-check
any byte-limited column lengths before importing.