Ukrainian Word Counter

Word count for Ukrainian Cyrillic with apostrophe-as-letter handling

Count words in Ukrainian Cyrillic text where the apostrophe (') is a letter inside words like м'яч, not punctuation. Handles ї, і, є, ґ, hyphenated compounds, and shows characters and sentences. Runs in your browser.

How is the apostrophe handled?

In Ukrainian the apostrophe is an orthographic letter that separates a consonant from an iotated vowel, as in м'яч, об'єкт, or п'ять. The counter keeps an internal apostrophe inside the word, so м'яч counts as one word, not two. It recognises both the straight apostrophe (') and the typographic right single quote (').

Ukrainian Cyrillic poses a counting subtlety that trips up naive tools: the apostrophe is a letter, not punctuation. In words like м'яч, об'єкт, and п'ять it separates a consonant from an iotated vowel and belongs inside the word. A counter that splits on the apostrophe would wrongly report м'яч as two words. This tool keeps the apostrophe inside words and gives accurate word, character, and sentence totals.

How it works

The algorithm treats a word as a maximal run of letters and digits with internal hyphens and apostrophes allowed:

  • It matches [\p{L}\p{N}] runs, permitting an internal -, ', or the typographic ' between two such characters.
  • An apostrophe inside a word, as in п'ять, keeps the word whole. Both the straight ASCII apostrophe and the right single quotation mark are recognised.
  • A hyphen inside a word, as in будь-який or по-українськи, keeps the compound as one word; a spaced dash used as punctuation separates words.
  • Ukrainian-specific letters ї, і, є, and ґ are Unicode letters and count as word characters.

Characters are counted two ways: every character including spaces, and the length with whitespace removed. Sentences are counted by collapsing runs of terminal punctuation (., !, ?, ) so an ellipsis or a ?! combo counts as one boundary.

Example

The text:

П'ять синьо-жовтих м'ячів… Чи не так?

contains the words П'ять, синьо-жовтих, м'ячів, Чи, не, так — six words. The apostrophe keeps П'ять and м'ячів whole, and the hyphen keeps синьо-жовтих as a single compound.

Notes

  • Mixed Ukrainian-English text and Latin product names are counted sensibly because Latin letters are also word characters.
  • Numbers like 2026 count as one word; a number glued to a suffix by a hyphen, such as 90-ті, stays one compound word.