Czech Word Counter

Word count for Czech handling háček diacritics as word characters

Count words in Czech text with correct Unicode classification of all háček and accent letters (č, š, ž, ř, ě, ů, á, é, í, ó, ú, ý). Shows words, characters, and sentences. Runs in your browser.

How does it decide what counts as a word?

A word is a run of Czech letters and digits with internal hyphens and apostrophes kept inside. The counter splits on whitespace and punctuation, so words like příliš, žluťoučký, and řeřicha are counted correctly.

Czech orthography is rich in diacritics: the háček (caron) gives č, š, ž, ř, ě, ň, ď, ť, the kroužek gives ů, and acute accents give á, é, í, ó, ú, ý. A counter that classifies characters poorly can split a word at a diacritic. This tool relies on full Unicode letter classification so every Czech letter stays inside its word, and reports accurate word, character, and sentence totals.

How it works

The algorithm treats a word as a maximal run of letters and digits with internal hyphens and apostrophes allowed:

  • It matches [\p{L}\p{N}] runs, permitting an internal - or ' between two such characters.
  • Every Czech diacritic letter is a Unicode letter and counts as a word character: č š ž ř ě ň ď ť ů á é í ó ú ý and their uppercase forms.
  • A hyphen inside a word, as in česko-slovenský, keeps the compound as one word; a spaced dash used as punctuation separates words.

Characters are counted two ways: every character including spaces, and the length with whitespace removed. Sentences are counted by collapsing runs of terminal punctuation (., !, ?, ) so an ellipsis or a ?! combo counts as one boundary.

Example

The text:

Žluťoučký kůň… Příliš ano? Ne-li.

contains the words Žluťoučký, kůň, Příliš, ano, Ne-li — five words. Every diacritic stays inside its word, and the hyphen keeps Ne-li whole.

Notes

  • Mixed Czech-English text and Latin product names are counted sensibly because Latin letters are also word characters.
  • Numbers like 2026 count as one word; a number glued to a suffix by a hyphen, such as 90-tých, stays one compound word.