Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:ArroyoSystems Arroyo Json Schema Converter

From Leeroopedia
Revision as of 14:27, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/ArroyoSystems_Arroyo_Json_Schema_Converter.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Overview

JSON Schema Converter converts JSON Schema definitions into Arrow schemas. It uses the typify crate to parse JSON Schema into a type space, then maps the structural types to Arrow data types, supporting nested objects, arrays, optional fields, and various primitive types.

Description

The module provides:

  • to_arrow: The main entry point that takes a schema name and a JSON Schema string, produces an Arrow Schema. It internally creates a TypeSpace from the parsed JSON Schema, locates the root struct type, and recursively converts it.
  • get_type_space: Internal helper that parses the JSON Schema string into a RootSchema, assigns the root name "ArroyoJsonRoot", and creates a TypeSpace with custom derive attributes (bincode::Encode, bincode::Decode, PartialEq, PartialOrd).
  • to_arrow_datatype: Recursive type conversion that maps typify types to Arrow types:
    • Structs -> DataType::Struct with fields from properties
    • Strings -> DataType::Utf8
    • Integers -> DataType::Int64
    • Floats -> DataType::Float64
    • Booleans -> DataType::Boolean
    • Arrays -> DataType::List
    • Optional/nullable types -> marked nullable in the Arrow field
    • Date-time format strings -> DataType::Timestamp(Nanosecond, None)
    • Unrecognized/complex types -> DataType::Utf8 with JSON extension metadata

Usage

This is called during CREATE TABLE processing when a user specifies a JSON Schema for their source or sink format configuration.

Code Reference

Source Location

crates/arroyo-formats/src/json/schema.rs

Signature

pub const ROOT_NAME: &str = "ArroyoJsonRoot";

pub fn to_arrow(name: &str, schema: &str) -> anyhow::Result<arrow_schema::Schema>

Import

use arroyo_formats::json::schema::to_arrow;

I/O Contract

Inputs

Name Type Description
name &str Name for the root schema type (for error messages)
schema &str JSON Schema string to parse and convert

Outputs

Name Type Description
arrow_schema arrow_schema::Schema Arrow schema derived from the JSON Schema definition

Usage Examples

let json_schema = r#"{
    "type": "object",
    "properties": {
        "name": { "type": "string" },
        "age": { "type": "integer" },
        "email": { "type": "string" }
    },
    "required": ["name", "age"]
}"#;

let arrow_schema = to_arrow("User", json_schema)?;
// Schema with fields: name (Utf8, non-nullable), age (Int64, non-nullable), email (Utf8, nullable)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment