Environment:ArroyoSystems Arroyo Python UDF Runtime
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, UDF |
| Last Updated | 2026-02-08 08:00 GMT |
Overview
Python 3.12.5 runtime with PyO3 0.21 bindings for executing Python-based User Defined Functions (UDFs) within Arroyo pipelines.
Description
This environment provides the Python runtime required for Python UDF support in Arroyo. Python UDFs are an optional feature (behind the `python` Cargo feature flag) that allows users to write scalar UDFs in Python using a `@udf` decorator. The Python integration uses PyO3 for Rust-Python FFI, with a threaded sub-interpreter model to avoid GIL contention. Arrow data is converted between Rust and Python via a custom PyArrow bridge, enabling vectorized processing of record batches.
Usage
Use this environment when Python UDFs are needed in SQL pipelines. Python UDFs are defined using the `@udf` decorator in the `arroyo_udf` module and can process Arrow arrays directly. The Python feature must be explicitly enabled at build time (`--features python`). If disabled, users will receive an error: "Python is not enabled in this build of Arroyo".
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| Python | 3.12.5 (exact version) | From python-build-standalone |
| CPU | x86_64 or aarch64 | Python standalone builds available for these architectures |
| OS | Linux | python-build-standalone targets linux-gnu |
Dependencies
System Packages
- `python3.12` (CPython 3.12.5 standalone build)
- Python headers and libraries (installed from standalone distribution)
Rust Crate Dependencies
- `pyo3` = 0.21 (optional, gated behind `python-enabled` feature)
Python Packages
- `pyarrow` (for Arrow array interop)
- Custom `arroyo_udf` module (bundled with Arroyo)
Credentials
No additional credentials required for Python UDF execution.
Quick Install
# Python is bundled in the Docker image. For manual install:
PY_ARCH=$(uname -m)
PY_VERSION=3.12.5
PY_RELEASE=20240814
curl -LO "https://github.com/indygreg/python-build-standalone/releases/download/${PY_RELEASE}/cpython-${PY_VERSION}+${PY_RELEASE}-${PY_ARCH}-unknown-linux-gnu-install_only.tar.gz"
tar xzf cpython-*.tar.gz -C /usr/local --strip-components=1
# Build Arroyo with Python support
cargo build --release --features python
Code Evidence
Feature flag configuration from `arroyo-udf-python/Cargo.toml`:
[features]
python-enabled = ["dep:pyo3"]
[dependencies]
pyo3 = { workspace = true, optional = true }
Python UDF decorator from `arroyo_udf.py:1-6`:
import pyarrow
def udf(func):
func._is_udf = True
return func
Threaded sub-interpreter model from `interpreter.rs`:
pub struct SubInterpreter {
// Runs Python code in a dedicated thread to avoid GIL issues
// with the Tokio async runtime
}
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `Python is not enabled in this build of Arroyo` | Built without `python` feature flag | Rebuild with `cargo build --features python` |
| `ModuleNotFoundError: No module named 'pyarrow'` | PyArrow not installed | `pip install pyarrow` |
| `pyo3 build error` | Python 3.12 headers not found | Install Python 3.12 standalone build |
Compatibility Notes
- Feature flag required: Python UDF support is opt-in via `--features python` at build time. The default Docker image includes Python support.
- Thread model: Python UDFs run in a dedicated thread with a sub-interpreter to avoid blocking the Tokio async runtime with GIL acquisition.
- PyArrow bridge: Custom conversion layer between Rust Arrow arrays and Python PyArrow arrays. Not all Arrow types may be supported.
- x86_64 and aarch64 only: Python standalone builds are only available for these two architectures.