Chinese Sentence Length Distribution

Analyze short vs. long sentence distribution in Chinese text

Calculates sentence length distribution in characters for Simplified Chinese text, splitting on 。!? punctuation and flagging overly long sentences that hurt readability.

How does the tool count a Chinese sentence?

It splits text on the Chinese sentence-ending punctuation 。!? (and the ASCII equivalents . ! ?), treating each resulting chunk as one sentence. Trailing empty chunks are ignored.

Long sentences are the single biggest readability problem in Chinese writing. Because Chinese has no spaces between words and packs meaning densely into each character, a sentence that runs past 40–50 characters becomes hard to parse at a glance. This tool splits your Simplified Chinese text into sentences, measures each one in Han characters, and shows you the distribution so you can find and fix the run-ons.

How it works

The tool segments text on the Chinese sentence-terminating punctuation marks — the full stop , exclamation mark and question mark — plus their ASCII equivalents . ! ?. Each chunk between these marks is one sentence.

For each sentence it counts only CJK ideographs (the Unicode range U+4E00U+9FFF and common extensions), ignoring Latin letters, spaces and punctuation. Sentences are then sorted into buckets:

  • Short: 1–15 characters
  • Medium: 16–40 characters
  • Long: more than 40 characters (flagged)

You can change the long threshold to match your house style.

Tips and example

Paste a paragraph such as 今天天气很好。我们去公园散步,看到很多花,还遇到了老朋友,聊了很久才回家。 and the tool reports two sentences: one short (6 characters) and one long (24 characters), with the average and the percentage over threshold.

A healthy distribution is mostly short and medium sentences. If more than roughly a quarter of your sentences are flagged as long, break them at natural clause boundaries — Chinese commas and semicolons are good places to start a new sentence.