Identify a file by its first bytes
A byte-order mark (BOM) is the Unicode code point U+FEFF written at the start of a file. Because that single code point encodes to different byte patterns in different encodings, the leading bytes act as a signature: they tell a reader whether the text is UTF-8, UTF-16 or UTF-32, and for the multi-byte forms, which endianness. This reference lists every BOM with its exact hex bytes and lets you paste a file’s opening bytes to detect which one it is.
How it works
The detector reads your hex bytes and tests them against the known signatures, longest first so it cannot mistake a four-byte UTF-32 LE mark (FF FE 00 00) for a two-byte UTF-16 LE mark (FF FE). If the bytes match a signature, the encoding and endianness are reported; if they do not, the file has no BOM — which for UTF-8 is the normal, recommended state. Endianness matters because UTF-16 and UTF-32 store each code unit across multiple bytes, and the BOM fixes whether the most or least significant byte comes first.
Tips and notes
Prefer UTF-8 without a BOM for source code, JSON, CSV and anything consumed by Unix tools — a stray EF BB BF breaks shebang lines, JSON parsers and string comparisons. When you must interoperate with Windows tools that expect a BOM, add it deliberately and document it. For UTF-16 and UTF-32, always emit a BOM or agree the endianness out of band, since there is no other reliable way to recover byte order. When debugging “weird first character” bugs, check for an unstripped BOM before anything else.