Traditional Chinese UTF-8 Byte Counter

Count UTF-8 byte cost of Traditional Chinese for storage and SMS limits

Counts the exact UTF-8 byte length of Traditional Chinese text (Taiwan/Hong Kong), where each Han character costs 3 bytes versus 1 for ASCII. Sizes VARCHAR columns and payloads precisely. Runs in your browser.

Does Traditional Chinese cost more bytes than Simplified?

In UTF-8 both cost 3 bytes per Han character, so the per-character cost is the same. Differences come only from how many characters a phrase uses, not from the script being Traditional or Simplified.

Traditional Chinese — used in Taiwan, Hong Kong, and Macau — costs the same three bytes per Han character in UTF-8 as Simplified Chinese. This tool encodes your text with the browser’s real UTF-8 encoder so you can size database columns, SMS segments, and JSON payloads with confidence.

How it works

UTF-8 encodes each character in one to four bytes depending on its Unicode code point:

U+0000 – U+007F   → 1 byte   (ASCII letters, digits)
U+0080 – U+07FF   → 2 bytes  (Latin accents, Greek, Cyrillic)
U+0800 – U+FFFF   → 3 bytes  (Traditional Han, CJK)
U+10000 and above → 4 bytes  (rare HK supplementary chars, emoji)

The tool feeds your text through TextEncoder for an exact byte total, then groups characters into bands so you can see where the weight comes from.

Example and tips

The phrase 臺灣 (“Taiwan”) is 2 characters but 6 bytes. Add the English word Taipei and you get 9 characters but 13 bytes, because the seven ASCII characters cost 1 byte each. When you migrate a legacy Big5 dataset to UTF-8, expect Traditional text to grow by roughly 50 percent in byte size, so re-check any byte-limited column lengths before importing.