Implementation:ArroyoSystems Arroyo Avro Serializer
Appearance
Overview
Avro Serializer converts Arrow RecordBatch data into Apache Avro Value records. It handles the mapping from Arrow's columnar format to Avro's row-oriented format, including nested structs, lists, nullable unions, and various numeric/temporal types.
Description
The module defines:
serialize: The main entry point that takes an AvroSchemaand an ArrowRecordBatch, producing aVec<Value>of Avro records. It creates oneRecordper row and serializes each column into it.
serialize_column: A generic function parameterized bySerializeTargetthat handles per-column serialization. It uses macros (write_arrow_value!,write_primitive!) to dispatch based on Arrow data types. Supported types include:- Primitives: Int8, Int32, Int64, UInt8, UInt32, UInt64, Float16, Float32, Float64
- Strings and booleans
- Timestamps (nanosecond to microsecond conversion)
- Dates (Date32, Date64)
- Decimals (Decimal128)
- Binary data
- Nested lists (recursive serialization)
- Nested structs (recursive serialization with nullable union wrapping)
SerializeTargettrait: An abstraction over the output target, implemented for bothVec<Option<Record>>(top-level records) andVec<Value>(list items).
Nullable fields are wrapped in Avro Union types with null as variant 0 and the actual type as variant 1.
Usage
This module is called by ArrowSerializer when a sink requires Avro output format.
Code Reference
Source Location
crates/arroyo-formats/src/avro/ser.rs
Signature
pub fn serialize(schema: &Schema, batch: &RecordBatch) -> Vec<Value>
fn serialize_column<T: SerializeTarget>(
schema: &Schema,
values: &mut T,
name: &str,
column: &ArrayRef,
nullable: bool,
)
Import
use crate::avro::ser::serialize;
I/O Contract
Inputs
| Name | Type | Description |
|---|---|---|
| schema | &Schema |
Target Avro schema defining the record structure |
| batch | &RecordBatch |
Arrow record batch containing the columnar data to serialize |
Outputs
| Name | Type | Description |
|---|---|---|
| values | Vec<Value> |
Avro record values, one per row in the input batch |
Usage Examples
use crate::avro::schema::to_avro;
let avro_schema = to_avro("MyRecord", &arrow_schema.fields);
let avro_values = serialize(&avro_schema, &record_batch);
// Each value is an Avro Record that can be written with apache_avro::Writer
for value in avro_values {
writer.append(value)?;
}
Related Pages
- ArroyoSystems_Arroyo_Avro_Deserializer - Complementary Avro deserialization
- ArroyoSystems_Arroyo_Avro_Schema_Converter - Schema conversion between Avro and Arrow
- ArroyoSystems_Arroyo_Format_Serializer - The main ArrowSerializer that uses this module
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment