Implementation:Eventual Inc Daft Arrow Utils
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Data_Interop, FFI |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Concrete tool for fixing PyArrow array compatibility issues before data passes through the FFI boundary into the Rust engine.
Description
The arrow_utils module provides a compatibility layer that applies two critical fixes to PyArrow arrays, chunked arrays, and tables before they cross the Python-to-Rust FFI boundary:
- _FixEmptyStructArrays: Converts empty StructArrays (with zero fields) to single-field StructArrays with a NullType placeholder, because arrow2's FFI cannot handle empty StructArrays. The `remove_empty_struct_placeholders` function reverses this transformation on the way back.
- _FixSliceOffsets: Propagates struct and fixed-size list array slice offsets to child arrays to prevent them from being silently dropped during record batch conversion. This works around a known PyArrow issue (apache/arrow#34639).
Usage
Call `ensure_table`, `ensure_array`, or `ensure_chunked_array` before passing PyArrow data through FFI into the Rust engine. Call `remove_empty_struct_placeholders` when converting data back from Rust to Python.
Code Reference
Source Location
- Repository: Eventual_Inc_Daft
- File: daft/arrow_utils.py
- Lines: 1-252
Signature
def ensure_array(arr: pa.Array) -> pa.Array:
"""Applies all fixes to an Arrow array."""
def ensure_chunked_array(arr: pa.ChunkedArray) -> pa.ChunkedArray:
"""Applies all fixes to an Arrow chunked array."""
def ensure_table(tbl: pa.Table) -> pa.Table:
"""Applies all fixes to an Arrow table."""
def remove_empty_struct_placeholders(arr: pa.Array) -> pa.Array:
"""Recursively removes the empty struct placeholders."""
Import
from daft.arrow_utils import ensure_table, ensure_array, ensure_chunked_array, remove_empty_struct_placeholders
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| arr | pa.Array or pa.ChunkedArray | Yes | Arrow array to fix before FFI transfer |
| tbl | pa.Table | Yes | Arrow table to fix before FFI transfer |
Outputs
| Name | Type | Description |
|---|---|---|
| fixed array/table | pa.Array / pa.ChunkedArray / pa.Table | Arrow data with empty struct and slice offset issues resolved |
Usage Examples
Fixing a Table Before FFI
import pyarrow as pa
from daft.arrow_utils import ensure_table
# Create a table with an empty struct column
empty_struct_type = pa.struct([])
table = pa.table({
"id": [1, 2, 3],
"metadata": pa.array([{}, {}, {}], type=empty_struct_type),
})
# Fix before sending through FFI
fixed_table = ensure_table(table)
# The "metadata" column is now a single-field struct with a null placeholder
Semantic Links
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment