Principle:Eventual Inc Daft Streaming Row Iteration

Knowledge Sources	Daft Daft Docs
Domains	Data_Engineering, Streaming
Last Updated	2026-02-08 00:00 GMT

Overview

Technique for streaming DataFrame results row-by-row without materializing the entire dataset in memory.

Description

Streaming iteration yields rows one at a time from a buffered execution pipeline, enabling processing of datasets larger than memory. Supports configurable buffer sizes to control the tradeoff between throughput and memory consumption. Rows can be returned in either Python-native format (with type coercion) or as Arrow scalars for efficient handling of nested data.

Usage

Use streaming row iteration when you need to process results incrementally without loading all data into memory. Common scenarios include writing rows to an external system, streaming results to a client, or processing datasets that exceed available RAM.

Theoretical Basis

Iterator-based streaming pattern with backpressure through configurable buffer sizes. The execution pipeline produces partitions asynchronously while the consumer pulls rows on demand:

buffer = BoundedQueue(size=results_buffer_size)

# Producer (async)
for each partition P in execution_plan:
    buffer.put(P)  # blocks when buffer full (backpressure)

# Consumer (iterator)
for each partition P in buffer:
    for each row R in P:
        yield dict(col_name -> R[col])

Setting results_buffer_size=None removes the buffer limit, allowing maximum throughput at the cost of higher memory usage.

Related Pages

Implemented By

Implementation:Eventual_Inc_Daft_DataFrame_Iter_Rows

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment