Chinese Simplified Word Counter

Segment and count words in Simplified Chinese text (no spaces)

Count words in Simplified Chinese, which is written without spaces. Uses forward maximum matching against a built-in dictionary, the greedy strategy at the heart of Jieba segmentation, and shows the segmented words. Runs in your browser.

Why does Chinese need word segmentation?

Chinese is written without spaces between words, so software cannot simply split on whitespace. A single string of characters must be broken into words using a dictionary or statistical model, a step known as word segmentation, before words can be counted.

Because Simplified Chinese is written with no spaces between words, counting words means first deciding where one word ends and the next begins. This tool performs that segmentation in the browser and then counts the resulting words, showing you exactly how the text was split.

How it works

The counter uses forward maximum matching, the greedy approach at the core of dictionary-based segmenters such as Jieba. Starting from each position in the text, it searches for the longest character sequence that appears in its built-in dictionary and takes that as a single word. If no multi-character word matches, it falls back to counting the lone character as a word.

Runs of Latin letters or digits are grouped into one token each, and standalone punctuation is skipped. The result is a list of word segments and their total count.

Example and notes

A sentence like a common everyday phrase will segment into recognisable multi-character words plus any single characters that the dictionary does not cover. Inspect the highlighted chips to see how ambiguous sequences were split; greedy matching occasionally prefers a longer word where a human might choose two shorter ones. The dictionary is bundled, so segmentation works offline and your text never leaves the browser.