Apache Avro schema types
Apache Avro is a compact, schema-driven serialization format used heavily in Kafka and Hadoop ecosystems. An Avro schema is itself JSON, declaring one of three families of types: primitives, complex types, and logical types layered on top of those. This reference lists each type with its JSON schema representation so you can build correct .avsc files quickly.
How it works
A schema is a JSON value: a string naming a primitive ("string"), an array forming a union (["null", "long"]), or an object describing a complex type with a "type" field.
The eight primitives are null, boolean, int, long, float, double, bytes and string. Integers use variable-length ZigZag encoding so small numbers are tiny.
The complex types add structure:
record— named fields, each a nested schema.enum— a fixed set of named symbols.array— a list of items of one type.map— string keys to values of one type.union— a value of exactly one of several listed schemas.fixed— a fixed number of bytes.
Logical types annotate a base type with semantic meaning via a logicalType attribute. For example decimal rides on bytes or fixed with precision and scale; date rides on int as days since the Unix epoch; timestamp-millis rides on long. A reader that does not recognize the logical type still reads the base value.
Tips and example
A nullable timestamp field inside a record:
{
"type": "record",
"name": "Event",
"fields": [
{
"name": "createdAt",
"type": ["null", { "type": "long", "logicalType": "timestamp-millis" }],
"default": null
}
]
}
When making fields optional, put "null" first in the union and set "default": null — that lets new readers tolerate old data and aids schema evolution. Filter the table below by kind to find the exact JSON shape for any Avro type.