Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:ArroyoSystems Arroyo Pyarrow Conversion

From Leeroopedia
Revision as of 14:28, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/ArroyoSystems_Arroyo_Pyarrow_Conversion.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Streaming, UDF, Python, Arrow_Processing
Last Updated 2026-02-08 08:00 GMT

Overview

The Converter struct provides bidirectional conversion between Arrow arrays and Python objects, enabling Python UDFs to receive typed arguments and return typed results within Arroyo's streaming pipeline.

Description

The Converter struct in the pyarrow module implements two core conversion functions adapted from the RisingWave Labs arrow-udf-python project:

get_pyobject(py, array, i) -- Extracts a single element from an Arrow array at index i as a Python object. Supports:

  • Null, Boolean, all integer types (Int8 through UInt64), Float32, Float64
  • Utf8, LargeUtf8, Binary, LargeBinary string/bytes types
  • List arrays (recursively converted to Python lists)
  • Struct arrays (converted to Python objects with named attributes via Struct() evaluation)
  • Null values return py.None()

build_array(data_type, py, values) -- Constructs an Arrow array from a slice of PyObject values. Uses macro-generated builder patterns for efficiency:

  • build_array! macro handles primitive types, string/bytes types, and the Null type
  • List and LargeList types flatten nested Python iterables into Arrow offset-based list arrays with null tracking
  • Struct types iterate over fields, extracting either attribute access (getattr) or item access (get_item) from Python objects
  • Null values in Python (is_none) are mapped to Arrow nulls

Both functions handle the full range of Arrow data types needed for Arroyo's SQL type system, with unsupported types producing PyTypeError.

Usage

Used internally by ThreadedUdfInterpreter to convert Arrow record batch columns to Python objects for UDF invocation and to convert Python return values back to Arrow arrays for downstream processing.

Code Reference

Source Location

Signature

pub struct Converter {}

impl Converter {
    /// Get array element as a python object.
    pub fn get_pyobject(
        py: Python<'_>,
        array: &dyn Array,
        i: usize,
    ) -> PyResult<PyObject>;

    /// Build arrow array from python objects.
    pub fn build_array(
        data_type: &DataType,
        py: Python<'_>,
        values: &[PyObject],
    ) -> PyResult<ArrayRef>;
}

Import

use arroyo_udf_python::pyarrow::Converter;

I/O Contract

Inputs

Name Type Required Description
array &dyn Array Yes (for get_pyobject) Arrow array from which to extract a single element
i usize Yes (for get_pyobject) Row index to extract from the array
data_type &DataType Yes (for build_array) Target Arrow data type for the output array
values &[PyObject] Yes (for build_array) Python objects to convert into an Arrow array

Outputs

Name Type Description
pyobject PyObject Python representation of a single Arrow array element (for get_pyobject)
array_ref ArrayRef Arrow array built from Python objects (for build_array)

Usage Examples

use arroyo_udf_python::pyarrow::Converter;
use arrow::array::{Int64Array, ArrayRef};
use arrow::datatypes::DataType;
use pyo3::Python;

Python::with_gil(|py| {
    // Convert Arrow element to Python
    let array = Int64Array::from(vec![Some(42), None, Some(7)]);
    let py_val = Converter::get_pyobject(py, &array, 0)?;
    // py_val is Python int 42

    // Convert Python objects to Arrow array
    let values: Vec<PyObject> = vec![42i64.into_py(py), py.None(), 7i64.into_py(py)];
    let arrow_array: ArrayRef = Converter::build_array(
        &DataType::Int64,
        py,
        &values,
    )?;
    // arrow_array is Int64Array [42, null, 7]
    Ok(())
});

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment