Telugu is an abugida in which a consonant carries an inherent vowel, vowel signs modify it, and the virama binds consonants into conjuncts. The unit a reader perceives as one letter — the akshara — therefore spans several Unicode code points. This free tool segments Telugu text into aksharas and compares that human-facing total with the raw code-point count.
How it works
The counter walks the text and applies the segmentation rules of the Telugu abugida:
- An independent vowel (
అtoఔ) starts a new akshara. - A consonant starts a new akshara, unless it immediately follows a virama, in which case it joins the current akshara as part of a conjunct.
- A vowel sign (matra), the virama itself, and marks such as the anusvara and visarga attach to the current akshara without starting a new one.
- Non-Telugu characters such as spaces, punctuation, and Latin letters close any open akshara and are tallied separately.
The akshara total is the count of perceived letters. The code-point total counts every Unicode scalar, so a single akshara built from a consonant, a virama, a conjunct consonant, and a vowel sign can contribute four to the code-point total but one to the akshara total.
Tips and example
Use the akshara count when a limit is meant for human readers — a headline, a caption, or a name field — and the code-point count when you are sizing a database column or estimating file size. For కి, the tool reports one akshara but two code points; for the cluster క్క, it reports one akshara but three code points, because the virama and the second consonant attach to the first.
If a count looks off, check the non-Telugu column for stray Latin letters, zero-width joiners, or combining marks pasted from a PDF, which inflate code points without adding readable aksharas.