Russian Word Counter

Word count for Russian using Cyrillic word-boundary rules

Count words in Russian Cyrillic text with proper word-boundary handling: em-dash and en-dash as separators, hyphenated compounds kept as one word, and apostrophes inside words preserved. Also shows characters and sentences. Runs in your browser.

How does it decide what counts as a word?

A word is a run of Cyrillic or Latin letters and digits, with internal hyphens and apostrophes kept inside the word. The counter splits on whitespace, em-dashes, en-dashes, and punctuation, so что-то counts as one word but слово — слово counts as two.

Counting words in Russian needs more care than a naive space split. Russian typography uses the em-dash heavily as punctuation, while hyphens join genuine compound words like кто-то and по-русски. This counter applies the correct boundary rules so compounds stay whole, dashes-as-punctuation split words, and you get accurate word, character, and sentence totals.

How it works

The algorithm treats a word as a maximal run of letters and digits with internal hyphens and apostrophes allowed:

  • It matches [\p{L}\p{N}] runs, permitting an internal - or ' between two such characters.
  • A hyphen inside a word (no surrounding spaces) keeps the compound as one word: что-то, нью-йоркский.
  • An em-dash or en-dash , which in Russian is almost always set off with spaces, falls outside the word pattern and therefore separates the words on either side.

Characters are counted two ways: every character including spaces, and the length with whitespace removed. Sentences are counted by collapsing runs of terminal punctuation (., !, ?, ) so that an ellipsis or a ?! combo counts as a single sentence boundary.

Example

The text:

Кто-то сказал: «Привет» — и ушёл.

contains the words Кто-то, сказал, Привет, и, ушёл — five words. The compound Кто-то stays as one word, while the spaced em-dash does not merge Привет and и.

Notes

  • The Latin-letter allowance means mixed Russian-English text and Latin product names are also counted sensibly.
  • Numbers like 2026 count as one word; a number glued to a unit by a hyphen, such as 5-летний, stays a single compound word as Russian expects.