What is avro?
12 Jan 2024
Avro is a language neutral data serialization system tackling the issue of Hadoop Writables
not being language portable.
- Code generation is optional in Avro, unlike Protocol Buffers.
- Usually schema written in JSON and encoded binary
- Avro datafiles has metadata section where schema is stored.
- Also support compression and are splittable
Avro Data Types and Schemas
Java can do
- Dynamic mapping of the schema (generic). Field names are referred with string value.
- Code generation
- Even Reflect mapping, using reflection to infer the Avro types
In-Memory Serialization and Deserialization
Can use Maven or Ant to create Java code for a schema.
Avro Datafiles
Datafiles have a metadata header, a sync marker and a set of blocks.
Schema Resolution
Strategy Avro does is having a reader and writer schemas that you use to read and write.
Avro MapReduce
Avro provides number of classes to make it easy to run MapReduce programs.