Thai Word Counter

Segment and count words in space-free Thai text

Free Thai word counter — segment continuous Thai text into words with a client-side longest-match dictionary and count them, since Thai is written without spaces between words, all in your browser.

Why is counting Thai words hard?

Thai is written with no spaces between words; spaces mark phrase or sentence breaks instead. A program must guess where one word ends and the next begins, which is an ambiguous segmentation problem rather than a simple space-split.

Thai is written without spaces between words — a space marks a phrase or sentence break, not a word boundary. That makes counting words a real segmentation problem. This free tool splits continuous Thai into words using a client-side longest-match dictionary and counts them.

How it works

The tool uses greedy maximal matching. Starting at each position in the text, it searches its built-in Thai dictionary for the longest word that matches from that point. It emits that word as one token and advances past it, then repeats from the new position. When no dictionary word matches, it consumes a single grapheme cluster — a base consonant plus any stacked vowels and tone marks — as an unknown token so the scan never stalls.

Latin runs, digit runs, and punctuation are each tokenised separately, and explicit spaces act as hard boundaries. The word count is the number of dictionary and unknown Thai tokens plus any Latin/numeric tokens.

Tips and notes

Longest-match is fast and works well for everyday vocabulary, but it is a heuristic: proper names, technical terms, and compounds that aren’t in the dictionary may be mis-split, and a single greedy choice can occasionally pick the wrong boundary. For important documents, scan the segmented word list shown below the count to confirm the split looks right. Everything runs locally in your browser, so your text is never uploaded.