UTF-8 Byte Counter

Count the exact number of UTF-8 bytes in any text string

Ad placeholder (leaderboard)

Counting real UTF-8 bytes

Characters and bytes are not the same thing. A tweet, a database column, an HTTP header, or an SMS segment is limited by bytes, while what you see on screen is characters. This counter reports the exact UTF-8 byte length of your text so you can tell whether it will actually fit.

How it works

The tool encodes your string with the browser’s built-in TextEncoder, which implements the official UTF-8 rules, and reports the length of the resulting byte array. UTF-8 is variable-width:

U+0000 – U+007F   1 byte   (ASCII)
U+0080 – U+07FF   2 bytes  (accented Latin, Greek, Cyrillic, Hebrew, Arabic)
U+0800 – U+FFFF   3 bytes  (most CJK, symbols)
U+10000 – U+10FFFF 4 bytes (emoji, rare scripts)

Alongside the byte total it counts code points (logical characters), UTF-16 code units (the JavaScript .length value), and splits characters into ASCII versus multi-byte so the difference between counts is obvious.

Tips and notes

If a “255 character” field rejects your text, the limit is almost certainly 255 bytes — and café 🌍 is 6 characters but 11 bytes. Watch the multi-byte count: every non-ASCII character costs at least two bytes, and a single emoji costs four. When a system reports a JavaScript .length, remember that is UTF-16 code units, so an emoji counts as 2 there but 4 UTF-8 bytes and just 1 actual character.

Ad placeholder (rectangle)