Swedish adds three vowels — å, ä, and ö — that are distinct letters of the
alphabet, not accented variants. A counter must classify them as letters so a
word like förälder or sjöräddning is never split at a diacritic. Swedish
also writes compounds as one long word, so spacing rules matter. This tool
applies the correct boundary rules and reports accurate word, character, and
sentence totals.
How it works
The algorithm treats a word as a maximal run of letters and digits with internal hyphens and apostrophes allowed:
- It matches
[\p{L}\p{N}]runs, permitting an internal-or'between two such characters. - The Swedish vowels
å,ä, andöare Unicode letters and count as word characters. - A hyphen inside a word, as in
TV-programor90-tal, keeps the form as one word; a spaced dash used as punctuation separates words.
Characters are counted two ways: every character including spaces, and the
length with whitespace removed. Sentences are counted by collapsing runs of
terminal punctuation (., !, ?, …) so an ellipsis or a ?! combo counts
as one boundary.
Example
The text:
En sjöräddningshelikopter flög… Eller hur? TV-program.
contains the words En, sjöräddningshelikopter, flög, Eller, hur,
TV-program — six words. The long compound stays one word, and the hyphen keeps
TV-program whole.
Notes
- Mixed Swedish-English text and Latin product names are counted sensibly because Latin letters are also word characters.
- Numbers like
2026count as one word; a number glued to a suffix by a hyphen, such as90-tal, stays one compound word.