Tamil Character Counter

Count Tamil uyir, mei, and uyirmei characters separately

Free Tamil character counter that classifies each letter as uyir (vowel), mei (pure consonant), or uyirmei (consonant + vowel), and compares perceived characters with raw Unicode code points. Runs in your browser.

Why does the character count differ from my editor's count?

Plain editors count Unicode code points. A Tamil uyirmei such as கா is written with a base letter plus a combining vowel sign, which is two code points but one perceived character. This tool counts the perceived character and also shows the raw code-point total.

Tamil is written with an abugida script in which every consonant carries an inherent vowel unless it is silenced. That makes counting characters less obvious than in Latin text: the visible unit a reader perceives as one letter often spans two Unicode code points. This free tool classifies each Tamil letter as a uyir, mei, or uyirmei and reports both the perceived-character total and the raw code-point total.

How it works

The tool walks the text one code point at a time and looks ahead one position:

  • An independent vowel from to is counted as a uyir.
  • A base consonant followed by the pulli mark (for example க்) is a mei — a pure consonant with the inherent vowel removed.
  • A base consonant followed by a dependent vowel sign such as , ி, or is a uyirmei. A bare base consonant with no following sign also counts as a uyirmei, because it carries the inherent vowel and is read as ka, cha, and so on.
  • The aytam is counted separately, as are non-Tamil characters like spaces, punctuation, and Latin letters.

The perceived-character total is the sum of uyir, mei, uyirmei, and aytam. The code-point total counts every Unicode scalar, so a uyirmei written with a vowel sign adds two to the code-point total but one to the perceived-character total.

Tips and example

Use the perceived-character total when you need to respect a human-facing limit, such as a poster headline or an SMS body. Use the code-point total when you are sizing a database column or a file. For the word தமிழ், the tool reports three perceived characters — (uyirmei), மி (uyirmei), and ழ் (mei) — but more code points, because the vowel sign and the pulli each add one.

If a count looks too high, check for invisible characters: zero-width joiners, combining marks pasted from a PDF, or stray Latin letters all show up in the non-Tamil column.