Implementation:Eventual Inc Daft DataFrame To Arrow
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Interoperability |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for converting a Daft DataFrame to a PyArrow Table provided by the Daft library. This is a wrapper doc type for the Arrow interoperability method.
Description
The to_arrow method on DataFrame converts the current Daft DataFrame to a PyArrow Table. It first streams all partitions as Arrow RecordBatches via to_arrow_iter(results_buffer_size=None), then combines them into a single PyArrow Table using pyarrow.Table.from_batches with the DataFrame's schema converted to a PyArrow schema. This is a blocking call that triggers execution of the lazy query plan.
Usage
Call df.to_arrow() on a DataFrame instance. Requires the pyarrow package. Use when you need Arrow-native output for interoperability with other data systems.
Code Reference
Source Location
- Repository: Daft
- File:
daft/dataframe/dataframe.py - Lines: L4438-4468
Signature
def to_arrow(self) -> pyarrow.Table
Import
# Method on DataFrame, no separate import needed
# Requires: pyarrow
table = df.to_arrow()
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| (none) | -- | -- | The method takes no arguments. |
Outputs
| Name | Type | Description |
|---|---|---|
| return | pyarrow.Table | A PyArrow Table containing the materialized data from the Daft DataFrame. |
External Dependencies
- pyarrow - required for output type
Usage Examples
Basic Usage
import daft
df = daft.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
arrow_table = df.to_arrow()
print(arrow_table)
# pyarrow.Table
# a: int64
# b: int64
# ----
# a: [[1,2,3]]
# b: [[4,5,6]]
Interop with DuckDB
import daft
import duckdb
df = daft.from_pydict({"x": [1, 2, 3], "y": [4, 5, 6]})
arrow_table = df.to_arrow()
# Query the Arrow table with DuckDB
result = duckdb.query("SELECT * FROM arrow_table WHERE x > 1")