This counter is built for modern Greek, where accents are an essential part of spelling rather than optional marks. The tonos sits on the stressed vowel of almost every multisyllabic word, and the dialytika marks a separately pronounced vowel. Both must be treated as letters so they never break a word apart.
How it works
The counter matches runs of Unicode letters and digits, allowing an internal
apostrophe such as in απ' το or a hyphen. Because the pattern covers the full
Greek and Greek-Extended character ranges, accented vowels like ά, ή, and
ώ, and dialytika like ϊ and προϊόντα, are all word characters. Sentences
are detected from terminal punctuation including the period, exclamation mark,
ellipsis, and the Greek question mark, which looks like a semicolon ;.
Tips and example
In “Πόσες λέξεις υπάρχουν εδώ;” there are four words, and the trailing ; is the
Greek question mark that ends the sentence, not a clause break. The accented-word
statistic is a handy quality check: if you paste Greek text and see far fewer
accented words than expected, the accents were probably lost during copying or
encoding, and the text should be restored before publishing.