Implementation:Eventual Inc Daft DataFrame Iter Rows

Knowledge Sources	Daft Daft Docs
Domains	Data_Engineering, Streaming
Last Updated	2026-02-08 00:00 GMT

Overview

Concrete tool for streaming DataFrame rows one at a time as Python dictionaries provided by the Daft library.

Description

The iter_rows method on DataFrame returns an iterator of rows, where each row is a Python dictionary mapping column names to values. If the DataFrame has already been collected, it uses precomputed results. Otherwise, it executes the DataFrame in a streaming fashion, iterating through partitions as they become available. The column_format parameter controls whether values are converted to Python objects ("python") or kept as Arrow scalars ("arrow"), which is more efficient for nested data types.

Usage

Call df.iter_rows() on a DataFrame instance. Use when you need to iterate over results row by row without materializing the entire dataset.

Code Reference

Source Location

Repository: Daft
File: daft/dataframe/dataframe.py
Lines: L429-514

Signature

def iter_rows(
    self,
    results_buffer_size: int | None | Literal["num_cpus"] = "num_cpus",
    column_format: Literal["python", "arrow"] = "python",
) -> Iterator[dict[str, Any]]

Import

# Method on DataFrame, no separate import needed
for row in df.iter_rows():
    print(row)

I/O Contract

Inputs

Name	Type	Required	Description
results_buffer_size	None \| Literal["num_cpus"]	No	How many partitions to buffer. Defaults to "num_cpus" (total CPUs on the machine). None removes buffer limit.
column_format	Literal["python", "arrow"]	No	Format of column values. "python" converts to native Python types; "arrow" keeps Arrow scalars. Defaults to "python".

Outputs

Name	Type	Description
return	Iterator[dict[str, Any]]	A streaming iterator where each element is a dictionary mapping column names to row values.

Usage Examples

Basic Usage

import daft

df = daft.from_pydict({"foo": [1, 2, 3], "bar": ["a", "b", "c"]})
for row in df.iter_rows():
    print(row)
# {'foo': 1, 'bar': 'a'}
# {'foo': 2, 'bar': 'b'}
# {'foo': 3, 'bar': 'c'}

Arrow Format for Nested Data

import daft

df = daft.from_pydict({"data": [[1, 2], [3, 4]]})
for row in df.iter_rows(column_format="arrow"):
    print(row)  # Values are Arrow scalars instead of Python lists

Related Pages

Implements Principle

Principle:Eventual_Inc_Daft_Streaming_Row_Iteration

Requires Environment

Environment:Eventual_Inc_Daft_Python_PyArrow_Core

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment