Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft Arrow Utils

From Leeroopedia
Revision as of 14:52, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Eventual_Inc_Daft_Arrow_Utils.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)


Knowledge Sources
Domains Data_Interop, FFI
Last Updated 2026-02-08 14:00 GMT

Overview

Concrete tool for fixing PyArrow array compatibility issues before data passes through the FFI boundary into the Rust engine.

Description

The arrow_utils module provides a compatibility layer that applies two critical fixes to PyArrow arrays, chunked arrays, and tables before they cross the Python-to-Rust FFI boundary:

  1. _FixEmptyStructArrays: Converts empty StructArrays (with zero fields) to single-field StructArrays with a NullType placeholder, because arrow2's FFI cannot handle empty StructArrays. The `remove_empty_struct_placeholders` function reverses this transformation on the way back.
  1. _FixSliceOffsets: Propagates struct and fixed-size list array slice offsets to child arrays to prevent them from being silently dropped during record batch conversion. This works around a known PyArrow issue (apache/arrow#34639).

Usage

Call `ensure_table`, `ensure_array`, or `ensure_chunked_array` before passing PyArrow data through FFI into the Rust engine. Call `remove_empty_struct_placeholders` when converting data back from Rust to Python.

Code Reference

Source Location

Signature

def ensure_array(arr: pa.Array) -> pa.Array:
    """Applies all fixes to an Arrow array."""

def ensure_chunked_array(arr: pa.ChunkedArray) -> pa.ChunkedArray:
    """Applies all fixes to an Arrow chunked array."""

def ensure_table(tbl: pa.Table) -> pa.Table:
    """Applies all fixes to an Arrow table."""

def remove_empty_struct_placeholders(arr: pa.Array) -> pa.Array:
    """Recursively removes the empty struct placeholders."""

Import

from daft.arrow_utils import ensure_table, ensure_array, ensure_chunked_array, remove_empty_struct_placeholders

I/O Contract

Inputs

Name Type Required Description
arr pa.Array or pa.ChunkedArray Yes Arrow array to fix before FFI transfer
tbl pa.Table Yes Arrow table to fix before FFI transfer

Outputs

Name Type Description
fixed array/table pa.Array / pa.ChunkedArray / pa.Table Arrow data with empty struct and slice offset issues resolved

Usage Examples

Fixing a Table Before FFI

import pyarrow as pa
from daft.arrow_utils import ensure_table

# Create a table with an empty struct column
empty_struct_type = pa.struct([])
table = pa.table({
    "id": [1, 2, 3],
    "metadata": pa.array([{}, {}, {}], type=empty_struct_type),
})

# Fix before sending through FFI
fixed_table = ensure_table(table)
# The "metadata" column is now a single-field struct with a null placeholder

Semantic Links

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment