Hungarian Word Counter

Word count for Hungarian with compound and hyphenated word rules

Count words in Hungarian text with correct handling of long agglutinative compounds that stay one word, hyphenated forms, and the full Hungarian alphabet including á é í ó ö ő ú ü ű. Runs in your browser.

How does it decide what counts as a word?

A word is a run of Hungarian letters and digits with internal hyphens and apostrophes kept inside. The counter splits on whitespace and punctuation, so a long compound like megszentségteleníthetetlenségeskedéseitekért counts as one word.

Counting words in Hungarian needs awareness of its agglutinative grammar. A single Hungarian word can stack many suffixes onto a stem, producing long forms like házaitokban or the famous megszentségteleníthetetlenségeskedéseitekért. Because these contain no spaces, they are exactly one orthographic word. This counter applies the right boundary rules so compounds stay whole and you get accurate word, character, and sentence totals.

How it works

The algorithm treats a word as a maximal run of letters and digits with internal hyphens and apostrophes allowed:

  • It matches [\p{L}\p{N}] runs, permitting an internal - or ' between two such characters.
  • Every Hungarian accented letter (á é í ó ö ő ú ü ű and uppercase) is a Unicode letter and counts as a word character.
  • A hyphen inside a word, as in dél-afrikai, keeps the form as one word; a spaced dash used as punctuation separates words.

Characters are counted two ways: every character including spaces, and the length with whitespace removed. Sentences are counted by collapsing runs of terminal punctuation (., !, ?, ) so an ellipsis or a ?! combo counts as one boundary.

Example

The text:

A nagymama megszentségteleníthetetlenségeskedéseitekért aggódott.

contains four words: A, nagymama, megszentségteleníthetetlenségeskedéseitekért, and aggódott. The very long compound is counted as a single word because it has no internal spaces.

Notes

  • Mixed Hungarian-English text and Latin product names are counted sensibly because Latin letters are also word characters.
  • Numbers like 2026 count as one word; a number glued to a suffix with a hyphen, such as 2026-ban, stays one compound word as Hungarian expects.