Count words in Japanese text
Japanese has no spaces between words, so ordinary word counters report one giant “word” per line. This tool segments your text at script boundaries — the points where kanji, hiragana, katakana, and Latin runs meet — to produce a fast, reasonable estimate of the word count.
How it works
The text is scanned character by character and a boundary is inserted whenever the writing system changes:
東京 → 東京 (kanji run = one token)
へ行きました → へ / 行き / ました (split at script + run length)
- Adjacent characters of the same script form a run; a script change starts a new token.
- Long hiragana runs are split further, since they often pack a content word together with grammatical particles.
- Punctuation and whitespace act as hard separators and are not counted.
Example and notes
Because content words tend to be kanji or katakana and particles tend to be hiragana, script boundaries approximate real word breaks well for everyday prose. It is an estimate, not a dictionary tokenizer: heavily inflected verbs or compound terms may be over- or under-split. For exact morphological counts use a MeCab-class tool. All processing happens in your browser.