Which scripts does it count?

It counts hiragana (U+3040–U+309F), katakana (U+30A0–U+30FF and half-width forms), kanji (the CJK Unified Ideographs block), and romaji (basic Latin letters). Punctuation, digits, and whitespace are excluded from the percentages so they reflect only script characters.

Why does kanji density indicate difficulty?

Kanji carry the most semantic weight and have the largest learning burden, so passages rich in kanji typically target literate adults. Texts aimed at children or beginners use furigana and lean on hiragana, lowering the kanji share.

How is katakana usually used?

Katakana mainly writes foreign loanwords, onomatopoeia, scientific names, and emphasis. A high katakana ratio often signals technical, marketing, or pop-culture content rather than literary prose.

Does it separate full-width and half-width katakana?

Both full-width and half-width katakana are counted together as katakana, since they represent the same script. Half-width forms are common in older systems and some technical contexts but read as the same characters.

What is the Japanese Mixed-Script Ratio Analyzer?

Calculates the percentage breakdown of hiragana, katakana, kanji, and romaji in a Japanese passage so you can gauge reading difficulty and audience. Runs entirely in your browser. It runs free in your browser on Gera Tools, with nothing uploaded.

Japanese Mixed-Script Ratio Analyzer

Name: Japanese Mixed-Script Ratio Analyzer
Creator: Gera Tools
License: https://creativecommons.org/licenses/by/4.0/

The Japanese Mixed-Script Ratio Analyzer breaks a passage into its four writing systems and reports the percentage of each. Because Japanese blends scripts, the mix is a fast proxy for reading difficulty and intended audience.

Why Japanese uses multiple scripts simultaneously

Most written languages use a single alphabet or abjad. Japanese is unusual in mixing four systems within a single sentence, and the choice of which script carries which word is not arbitrary — it signals grammar, origin, register, and intent:

Hiragana encodes grammatical elements: verb endings, particles, conjunctions, and function words. It is also used for young children’s books, as native Japanese writing before kanji literacy is acquired.
Katakana signals foreign origin (loanwords such as コーヒー koohii for coffee), onomatopoeia (ドキドキ for a pounding heart), scientific names, and emphatic or ironic speech.
Kanji carries the semantic weight: nouns, verb stems, adjectives, and proper names. Each character has meaning independent of its sound.
Romaji (Latin letters) appears in brand names, acronyms, URLs, and some technical contexts. Its presence often indicates internationally-oriented or modern content.

How the analysis works

Every character is classified by its Unicode block, and non-script characters (spaces, punctuation, digits) are excluded from the totals:

hiragana  U+3040–U+309F        grammar, particles, easy words
katakana  U+30A0–U+30FF + ﾊﾝｶｸ  loanwords, onomatopoeia, emphasis
kanji     U+4E00–U+9FFF (CJK)  content words, highest learning load
romaji    A–Z a–z              acronyms, brand names, foreign terms

Percentages are computed over the script-character total only, so spaces and punctuation do not dilute the figures. The kanji share is the headline indicator of difficulty because it represents the largest learning burden for both L1 children and L2 adult learners.

How to read the breakdown

Kanji share	What it suggests
Under 10%	Children’s text, JLPT N5/N4, or written casual speech
10–25%	General web content, novels, light journalism
25–40%	Newspaper articles, formal emails, business documents
Over 40%	Legal documents, academic papers, classical literature

A high katakana share — say over 20% — typically signals technology, pop culture, food and beverage content, or marketing copy aimed at an international feel. A near-zero romaji share suggests traditional or formal writing with no foreign brand names or URLs.

Two texts can share the same kanji percentage but differ significantly in reading feel: a 30% kanji text built on two-character compounds like 会社 (company) reads more smoothly than one built on rare single-character words with unusual readings. The ratio is a guide, not a complete difficulty score, but it is a fast and reliable first indicator.

Practical applications

Content editors and translators use the ratio to check that a localized Japanese text matches the register of the original — an English business document translated into Japanese should have a kanji density in the newspaper range, not the children’s book range.

Japanese learners can paste a text they want to read and quickly assess whether it is within their current ability range before committing to the effort of working through it.

NLP and text classification pipelines sometimes use script ratio as a lightweight feature for language identification or register detection — it is faster to compute than vocabulary-level analysis and does not require a Japanese dictionary.

Everything runs locally in your browser; no text is uploaded.