German word counting is trickier than English because the language builds long compound nouns and uses both the hyphen (as a joiner) and the dash (as a separator). This counter applies the correct rules so your totals match how a German editor would count.
How it works
A token is recognised as a sequence of German word characters — the Latin
letters plus ä ö ü ß ẞ and digits — optionally joined by hyphens or
apostrophes. Because the hyphen joins, E-Mail-Adresse and
Donau-Dampfschiff each count as a single word.
Before tokenising, every em-dash — and en-dash – is replaced by a space, so
these punctuation dashes act as word boundaries. That means Berlin—München
splits into two words, while E-Mail-Adresse stays as one. Sentences are
counted from terminal punctuation (. ! ? …), and the longest token is tracked
so you can see your biggest compound.
Example
For the text Die Donaudampfschifffahrtsgesellschaft schickt eine E-Mail-Adresse. Berlin—München ist weit. the counter reports five words —
Donaudampfschifffahrtsgesellschaft is one long word, E-Mail-Adresse is one
hyphenated compound, and Berlin—München is split into two. Two sentences are
detected, and the longest word is the 34-letter Danube-steamship compound.
Notes
Use this when you localise English copy into German and need accurate counts for layout, subtitles, or character limits — German runs roughly 10-30% longer than English, and compound handling materially changes the totals.