Hungarian Alphabetical Sort

Sort Hungarian with cs, dz, dzs, gy, ly, ny, sz, ty, zs treated as single letters

Sort a list of Hungarian words in correct alphabetical order. Treats the 8 digraphs and the trigraph dzs as single letters per Hungarian CLDR collation, so 'cs' sorts after 'c' and 'sz' after 's'.

How is Hungarian alphabetical order different?

Hungarian has 8 digraphs (cs, dz, gy, ly, ny, sz, ty, zs) and one trigraph (dzs) that each count as a single letter. So 'csak' sorts after 'cukor' because 'cs' comes after 'c' in the alphabet, even though plain string comparison would put it before.

Sorting Hungarian correctly is harder than a plain string compare because the Hungarian alphabet treats digraphs and one trigraph as single letters. This tool tokenises each word into Hungarian letters and orders them by the official alphabet position.

How it works

The Hungarian alphabet contains these multi-character letters: cs, dz, dzs, gy, ly, ny, sz, ty, zs. Each one occupies its own slot in the alphabet, sorting right after its base letter (so cs comes after c, sz after s, and the trigraph dzs after dz).

The algorithm builds a rank table for every Hungarian letter, including accented vowels (a < á, o < ó < ö < ő, u < ú < ü < ű). It then tokenises each word greedily, preferring the longest digraph match first (so dzs beats dz beats d). Each token is mapped to its rank, producing a key array. Comparing two words means comparing their rank arrays element by element — exactly how a Hungarian dictionary orders entries.

Why plain sorting fails

A naive Unicode sort would put csak before cukor because the character c is followed by s (115) which is below u (117). But in Hungarian, cs is a single letter that comes after c, so the correct order is cukor, then csak. This is the classic trap that this tool fixes.

Example

Input:

cukor
csak
alma
ánizs
szar
sajt

Correct Hungarian order:

alma
ánizs
cukor
csak
sajt
szar

Note that csak follows cukor (cs after c) and szar follows sajt (sz after s). All sorting runs locally in your browser.