Implementation:ArroyoSystems Arroyo Pyarrow Conversion
| Knowledge Sources | |
|---|---|
| Domains | Streaming, UDF, Python, Arrow_Processing |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
The Converter struct provides bidirectional conversion between Arrow arrays and Python objects, enabling Python UDFs to receive typed arguments and return typed results within Arroyo's streaming pipeline.
Description
The Converter struct in the pyarrow module implements two core conversion functions adapted from the RisingWave Labs arrow-udf-python project:
get_pyobject(py, array, i) -- Extracts a single element from an Arrow array at index i as a Python object. Supports:
- Null, Boolean, all integer types (Int8 through UInt64), Float32, Float64
- Utf8, LargeUtf8, Binary, LargeBinary string/bytes types
- List arrays (recursively converted to Python lists)
- Struct arrays (converted to Python objects with named attributes via Struct() evaluation)
- Null values return py.None()
build_array(data_type, py, values) -- Constructs an Arrow array from a slice of PyObject values. Uses macro-generated builder patterns for efficiency:
- build_array! macro handles primitive types, string/bytes types, and the Null type
- List and LargeList types flatten nested Python iterables into Arrow offset-based list arrays with null tracking
- Struct types iterate over fields, extracting either attribute access (getattr) or item access (get_item) from Python objects
- Null values in Python (is_none) are mapped to Arrow nulls
Both functions handle the full range of Arrow data types needed for Arroyo's SQL type system, with unsupported types producing PyTypeError.
Usage
Used internally by ThreadedUdfInterpreter to convert Arrow record batch columns to Python objects for UDF invocation and to convert Python return values back to Arrow arrays for downstream processing.
Code Reference
Source Location
- Repository: ArroyoSystems_Arroyo
- File: crates/arroyo-udf/arroyo-udf-python/src/pyarrow.rs
Signature
pub struct Converter {}
impl Converter {
/// Get array element as a python object.
pub fn get_pyobject(
py: Python<'_>,
array: &dyn Array,
i: usize,
) -> PyResult<PyObject>;
/// Build arrow array from python objects.
pub fn build_array(
data_type: &DataType,
py: Python<'_>,
values: &[PyObject],
) -> PyResult<ArrayRef>;
}
Import
use arroyo_udf_python::pyarrow::Converter;
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| array | &dyn Array | Yes (for get_pyobject) | Arrow array from which to extract a single element |
| i | usize | Yes (for get_pyobject) | Row index to extract from the array |
| data_type | &DataType | Yes (for build_array) | Target Arrow data type for the output array |
| values | &[PyObject] | Yes (for build_array) | Python objects to convert into an Arrow array |
Outputs
| Name | Type | Description |
|---|---|---|
| pyobject | PyObject | Python representation of a single Arrow array element (for get_pyobject) |
| array_ref | ArrayRef | Arrow array built from Python objects (for build_array) |
Usage Examples
use arroyo_udf_python::pyarrow::Converter;
use arrow::array::{Int64Array, ArrayRef};
use arrow::datatypes::DataType;
use pyo3::Python;
Python::with_gil(|py| {
// Convert Arrow element to Python
let array = Int64Array::from(vec![Some(42), None, Some(7)]);
let py_val = Converter::get_pyobject(py, &array, 0)?;
// py_val is Python int 42
// Convert Python objects to Arrow array
let values: Vec<PyObject> = vec![42i64.into_py(py), py.None(), 7i64.into_py(py)];
let arrow_array: ArrayRef = Converter::build_array(
&DataType::Int64,
py,
&values,
)?;
// arrow_array is Int64Array [42, null, 7]
Ok(())
});