Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:ArroyoSystems Arroyo Python UDF Runtime

From Leeroopedia


Knowledge Sources
Domains Infrastructure, UDF
Last Updated 2026-02-08 08:00 GMT

Overview

Python 3.12.5 runtime with PyO3 0.21 bindings for executing Python-based User Defined Functions (UDFs) within Arroyo pipelines.

Description

This environment provides the Python runtime required for Python UDF support in Arroyo. Python UDFs are an optional feature (behind the `python` Cargo feature flag) that allows users to write scalar UDFs in Python using a `@udf` decorator. The Python integration uses PyO3 for Rust-Python FFI, with a threaded sub-interpreter model to avoid GIL contention. Arrow data is converted between Rust and Python via a custom PyArrow bridge, enabling vectorized processing of record batches.

Usage

Use this environment when Python UDFs are needed in SQL pipelines. Python UDFs are defined using the `@udf` decorator in the `arroyo_udf` module and can process Arrow arrays directly. The Python feature must be explicitly enabled at build time (`--features python`). If disabled, users will receive an error: "Python is not enabled in this build of Arroyo".

System Requirements

Category Requirement Notes
Python 3.12.5 (exact version) From python-build-standalone
CPU x86_64 or aarch64 Python standalone builds available for these architectures
OS Linux python-build-standalone targets linux-gnu

Dependencies

System Packages

  • `python3.12` (CPython 3.12.5 standalone build)
  • Python headers and libraries (installed from standalone distribution)

Rust Crate Dependencies

  • `pyo3` = 0.21 (optional, gated behind `python-enabled` feature)

Python Packages

  • `pyarrow` (for Arrow array interop)
  • Custom `arroyo_udf` module (bundled with Arroyo)

Credentials

No additional credentials required for Python UDF execution.

Quick Install

# Python is bundled in the Docker image. For manual install:
PY_ARCH=$(uname -m)
PY_VERSION=3.12.5
PY_RELEASE=20240814
curl -LO "https://github.com/indygreg/python-build-standalone/releases/download/${PY_RELEASE}/cpython-${PY_VERSION}+${PY_RELEASE}-${PY_ARCH}-unknown-linux-gnu-install_only.tar.gz"
tar xzf cpython-*.tar.gz -C /usr/local --strip-components=1

# Build Arroyo with Python support
cargo build --release --features python

Code Evidence

Feature flag configuration from `arroyo-udf-python/Cargo.toml`:

[features]
python-enabled = ["dep:pyo3"]

[dependencies]
pyo3 = { workspace = true, optional = true }

Python UDF decorator from `arroyo_udf.py:1-6`:

import pyarrow

def udf(func):
    func._is_udf = True
    return func

Threaded sub-interpreter model from `interpreter.rs`:

pub struct SubInterpreter {
    // Runs Python code in a dedicated thread to avoid GIL issues
    // with the Tokio async runtime
}

Common Errors

Error Message Cause Solution
`Python is not enabled in this build of Arroyo` Built without `python` feature flag Rebuild with `cargo build --features python`
`ModuleNotFoundError: No module named 'pyarrow'` PyArrow not installed `pip install pyarrow`
`pyo3 build error` Python 3.12 headers not found Install Python 3.12 standalone build

Compatibility Notes

  • Feature flag required: Python UDF support is opt-in via `--features python` at build time. The default Docker image includes Python support.
  • Thread model: Python UDFs run in a dedicated thread with a sub-interpreter to avoid blocking the Tokio async runtime with GIL acquisition.
  • PyArrow bridge: Custom conversion layer between Rust Arrow arrays and Python PyArrow arrays. Not all Arrow types may be supported.
  • x86_64 and aarch64 only: Python standalone builds are only available for these two architectures.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment