Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Eventual Inc Daft DataFrame Iter Rows

From Leeroopedia


Knowledge Sources
Domains Data_Engineering, Streaming
Last Updated 2026-02-08 00:00 GMT

Overview

Concrete tool for streaming DataFrame rows one at a time as Python dictionaries provided by the Daft library.

Description

The iter_rows method on DataFrame returns an iterator of rows, where each row is a Python dictionary mapping column names to values. If the DataFrame has already been collected, it uses precomputed results. Otherwise, it executes the DataFrame in a streaming fashion, iterating through partitions as they become available. The column_format parameter controls whether values are converted to Python objects ("python") or kept as Arrow scalars ("arrow"), which is more efficient for nested data types.

Usage

Call df.iter_rows() on a DataFrame instance. Use when you need to iterate over results row by row without materializing the entire dataset.

Code Reference

Source Location

  • Repository: Daft
  • File: daft/dataframe/dataframe.py
  • Lines: L429-514

Signature

def iter_rows(
    self,
    results_buffer_size: int | None | Literal["num_cpus"] = "num_cpus",
    column_format: Literal["python", "arrow"] = "python",
) -> Iterator[dict[str, Any]]

Import

# Method on DataFrame, no separate import needed
for row in df.iter_rows():
    print(row)

I/O Contract

Inputs

Name Type Required Description
results_buffer_size None | Literal["num_cpus"] No How many partitions to buffer. Defaults to "num_cpus" (total CPUs on the machine). None removes buffer limit.
column_format Literal["python", "arrow"] No Format of column values. "python" converts to native Python types; "arrow" keeps Arrow scalars. Defaults to "python".

Outputs

Name Type Description
return Iterator[dict[str, Any]] A streaming iterator where each element is a dictionary mapping column names to row values.

Usage Examples

Basic Usage

import daft

df = daft.from_pydict({"foo": [1, 2, 3], "bar": ["a", "b", "c"]})
for row in df.iter_rows():
    print(row)
# {'foo': 1, 'bar': 'a'}
# {'foo': 2, 'bar': 'b'}
# {'foo': 3, 'bar': 'c'}

Arrow Format for Nested Data

import daft

df = daft.from_pydict({"data": [[1, 2], [3, 4]]})
for row in df.iter_rows(column_format="arrow"):
    print(row)  # Values are Arrow scalars instead of Python lists

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment