GB2312 / GBK Encoder/Decoder

Simplified Chinese GBK byte encoding and decoding

Ad placeholder (leaderboard)

GB2312 and its backward-compatible superset GBK are the standard legacy encodings for Simplified Chinese on the mainland. ASCII stays single-byte while each Chinese character is stored as a two-byte sequence. This tool encodes Simplified Chinese text into GBK hex bytes and decodes GBK bytes back into text, using the wider GBK table so GB2312 data round-trips too.

How it works

ASCII characters 0x000x7F are one byte. Each Chinese character is two bytes: a lead byte in 0x810xFE followed by a trail byte in 0x400xFE (skipping 0x7F). GB2312 occupies a subset of this space, and GBK fills in the rest, which is why a single GBK decoder handles both.

To stay faithful to the real table, the tool enumerates the single-byte range and every valid lead/trail pair, decodes each with the browser’s native GBK decoder, and builds a character-to-bytes map. Encoding looks each character up in that map; decoding runs the hex bytes through the native decoder.

Example and notes

  • "中文" encodes to d6 d0 ce c4 — two characters, each a two-byte pair, with as D6 D0 and as CE C4.
  • GBK targets Simplified Chinese; Traditional-only characters and symbols outside the set are flagged as unmapped.
  • For text that mixes scripts or needs emoji and rare characters, UTF-8 is the modern choice; GBK remains useful for interoperating with legacy Chinese files and systems.
Ad placeholder (rectangle)