What is a Swahili digraph?

A digraph is a pair of letters that together represent one sound. Swahili uses several, including ch, sh, ng, ng', ny, mb, nd, nj and th. Although they take two characters on the page, each digraph is a single phoneme when spoken.

Why does the phoneme-aware length differ from the character count?

The raw character count treats every letter separately, so the word ngoma counts as five characters. The phoneme-aware length collapses each detected digraph to one unit, so ng counts as a single graphic unit and ngoma becomes four units.

Does it count bytes too?

Yes. Swahili is written in the Latin alphabet, so most characters are a single UTF-8 byte and the byte count usually matches the character count closely. The byte figure is what SMS gateways and VARCHAR limits actually measure.

Are spaces and punctuation counted?

Total characters include spaces and punctuation. A separate figure excludes all whitespace so you can match whichever limit you are checking against.

Is my text uploaded anywhere?

No. All counting happens locally in your browser. Nothing you paste is sent to a server or stored.

Swahili Character Counter

Email me this result

Get this tool's output sent to your inbox, plus one useful tool a week. No spam, unsubscribe any time.

Swahili (Kiswahili) is written in the Latin alphabet, but several of its sounds are spelled with two letters. These digraphs — such as ch, sh, ng, ny, mb and nd — look like two characters yet represent a single phoneme. This counter reports the usual character and byte totals and also detects those digraphs so you can see a phoneme-aware length.

How it works

Characters are counted using Unicode-aware string handling, so each code point counts once. Bytes are the UTF-8 length from TextEncoder; because Swahili uses plain Latin letters, the byte count is normally very close to the character count.

For digraphs, the text is scanned left to right. At each position the tool checks the longest digraphs first (so ng' is matched before ng). When a digraph is found it is counted once and the scan jumps past both letters. The phoneme-aware length is the number of graphic units that result when every detected digraph collapses to one.

Example

The word:

ngoma

has five raw characters: n, g, o, m, a. Because ng is a digraph, the tool reports one digraph detected and a phoneme-aware length of four units (ng, o, m, a).

Notes

The apostrophe form ng' (the velar nasal) is recognised as its own digraph.
The byte count is the figure to use for SMS segments and database column limits; the character count is best for word-processing length checks.
Counting is case-insensitive for digraph detection, so Ng and ng are treated the same.