Implementation:Eventual Inc Daft DataFrame Iter Rows
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Streaming |
| Last Updated | 2026-02-08 00:00 GMT |
Overview
Concrete tool for streaming DataFrame rows one at a time as Python dictionaries provided by the Daft library.
Description
The iter_rows method on DataFrame returns an iterator of rows, where each row is a Python dictionary mapping column names to values. If the DataFrame has already been collected, it uses precomputed results. Otherwise, it executes the DataFrame in a streaming fashion, iterating through partitions as they become available. The column_format parameter controls whether values are converted to Python objects ("python") or kept as Arrow scalars ("arrow"), which is more efficient for nested data types.
Usage
Call df.iter_rows() on a DataFrame instance. Use when you need to iterate over results row by row without materializing the entire dataset.
Code Reference
Source Location
- Repository: Daft
- File:
daft/dataframe/dataframe.py - Lines: L429-514
Signature
def iter_rows(
self,
results_buffer_size: int | None | Literal["num_cpus"] = "num_cpus",
column_format: Literal["python", "arrow"] = "python",
) -> Iterator[dict[str, Any]]
Import
# Method on DataFrame, no separate import needed
for row in df.iter_rows():
print(row)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| results_buffer_size | None | Literal["num_cpus"] | No | How many partitions to buffer. Defaults to "num_cpus" (total CPUs on the machine). None removes buffer limit. |
| column_format | Literal["python", "arrow"] | No | Format of column values. "python" converts to native Python types; "arrow" keeps Arrow scalars. Defaults to "python". |
Outputs
| Name | Type | Description |
|---|---|---|
| return | Iterator[dict[str, Any]] | A streaming iterator where each element is a dictionary mapping column names to row values. |
Usage Examples
Basic Usage
import daft
df = daft.from_pydict({"foo": [1, 2, 3], "bar": ["a", "b", "c"]})
for row in df.iter_rows():
print(row)
# {'foo': 1, 'bar': 'a'}
# {'foo': 2, 'bar': 'b'}
# {'foo': 3, 'bar': 'c'}
Arrow Format for Nested Data
import daft
df = daft.from_pydict({"data": [[1, 2], [3, 4]]})
for row in df.iter_rows(column_format="arrow"):
print(row) # Values are Arrow scalars instead of Python lists