N-gram Generator

Extract character and word n-grams from any text.

Ad placeholder (leaderboard)

Break text into n-grams

This tool extracts n-grams — contiguous sequences of n characters or words — from any text and counts how often each one appears. N-grams are a foundational tool in natural language processing, used for language modeling, spelling correction, text classification, and similarity scoring.

How it works

Pick a mode and a window size n:

  • Word n-grams: the text is tokenized into Unicode word tokens, then a window of n consecutive tokens slides one position at a time.
  • Character n-grams: the window of n consecutive characters slides one position at a time over the raw text (optionally with whitespace collapsed).

A text of L items produces L - n + 1 n-grams. Each distinct n-gram is tallied, and the results are listed most-frequent first. Case folding can be enabled to treat The and the as the same n-gram.

Example

For the word bigrams (n = 2) of “to be or not to be”:

to be   2
be or   1
or not  1
not to  1

The pair to be appears twice; everything runs locally so your text stays private.

Ad placeholder (rectangle)