Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:ArroyoSystems Arroyo Avro Serializer

From Leeroopedia


Overview

Avro Serializer converts Arrow RecordBatch data into Apache Avro Value records. It handles the mapping from Arrow's columnar format to Avro's row-oriented format, including nested structs, lists, nullable unions, and various numeric/temporal types.

Description

The module defines:

  • serialize: The main entry point that takes an Avro Schema and an Arrow RecordBatch, producing a Vec<Value> of Avro records. It creates one Record per row and serializes each column into it.
  • serialize_column: A generic function parameterized by SerializeTarget that handles per-column serialization. It uses macros (write_arrow_value!, write_primitive!) to dispatch based on Arrow data types. Supported types include:
    • Primitives: Int8, Int32, Int64, UInt8, UInt32, UInt64, Float16, Float32, Float64
    • Strings and booleans
    • Timestamps (nanosecond to microsecond conversion)
    • Dates (Date32, Date64)
    • Decimals (Decimal128)
    • Binary data
    • Nested lists (recursive serialization)
    • Nested structs (recursive serialization with nullable union wrapping)
  • SerializeTarget trait: An abstraction over the output target, implemented for both Vec<Option<Record>> (top-level records) and Vec<Value> (list items).

Nullable fields are wrapped in Avro Union types with null as variant 0 and the actual type as variant 1.

Usage

This module is called by ArrowSerializer when a sink requires Avro output format.

Code Reference

Source Location

crates/arroyo-formats/src/avro/ser.rs

Signature

pub fn serialize(schema: &Schema, batch: &RecordBatch) -> Vec<Value>

fn serialize_column<T: SerializeTarget>(
    schema: &Schema,
    values: &mut T,
    name: &str,
    column: &ArrayRef,
    nullable: bool,
)

Import

use crate::avro::ser::serialize;

I/O Contract

Inputs

Name Type Description
schema &Schema Target Avro schema defining the record structure
batch &RecordBatch Arrow record batch containing the columnar data to serialize

Outputs

Name Type Description
values Vec<Value> Avro record values, one per row in the input batch

Usage Examples

use crate::avro::schema::to_avro;

let avro_schema = to_avro("MyRecord", &arrow_schema.fields);
let avro_values = serialize(&avro_schema, &record_batch);

// Each value is an Avro Record that can be written with apache_avro::Writer
for value in avro_values {
    writer.append(value)?;
}

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment