Tamil is written with an abugida script in which every consonant carries an inherent vowel unless it is silenced. That makes counting characters less obvious than in Latin text: the visible unit a reader perceives as one letter often spans two Unicode code points. This free tool classifies each Tamil letter as a uyir, mei, or uyirmei and reports both the perceived-character total and the raw code-point total.
How it works
The tool walks the text one code point at a time and looks ahead one position:
- An independent vowel from
அtoஔis counted as a uyir. - A base consonant followed by the pulli mark
்(for exampleக்) is a mei — a pure consonant with the inherent vowel removed. - A base consonant followed by a dependent vowel sign such as
ா,ி, orுis a uyirmei. A bare base consonant with no following sign also counts as a uyirmei, because it carries the inherent vowelஅand is read aska,cha, and so on. - The aytam
ஃis counted separately, as are non-Tamil characters like spaces, punctuation, and Latin letters.
The perceived-character total is the sum of uyir, mei, uyirmei, and aytam. The code-point total counts every Unicode scalar, so a uyirmei written with a vowel sign adds two to the code-point total but one to the perceived-character total.
Tips and example
Use the perceived-character total when you need to respect a human-facing limit, such as a poster headline or an SMS body. Use the code-point total when you are sizing a database column or a file. For the word தமிழ், the tool reports three perceived characters — த (uyirmei), மி (uyirmei), and ழ் (mei) — but more code points, because the vowel sign and the pulli each add one.
If a count looks too high, check for invisible characters: zero-width joiners, combining marks pasted from a PDF, or stray Latin letters all show up in the non-Tamil column.