Implementation:Online ml River Stream Iter Pandas

Knowledge Sources	Domains	Last Updated
River River Docs	Online Machine Learning, Data Streaming, Pandas Integration	2026-02-08 16:00 GMT

Overview

Concrete tool for converting a pandas DataFrame into a generator of (dict, target) tuples suitable for River's observation-by-observation online learning API.

Description

The stream.iter_pandas function takes a pandas DataFrame of features and an optional pandas Series (or DataFrame) of targets and yields one observation at a time. Each observation is a tuple (x, y) where x is a dictionary mapping column names to feature values and y is the corresponding target (or None if no target is provided). Internally, the function converts the DataFrame to a NumPy array and delegates to stream.iter_array, passing the original column names as feature names and, for multi-target scenarios, the target column names.

Usage

Import stream.iter_pandas whenever you need to iterate over a pandas DataFrame in a streaming fashion for use with any River estimator. This is the standard entry point for converting batch tabular data into a stream.

Code Reference

Source Location

river/stream/iter_pandas.py:L8-L48

Signature

def iter_pandas(
    X: pd.DataFrame,
    y: pd.Series | pd.DataFrame | None = None,
    **kwargs
) -> base.typing.Stream

Import

from river import stream

I/O Contract

Inputs

Parameter	Type	Description
X	`pd.DataFrame`	A DataFrame of features. Each column becomes a key in the yielded dictionaries.
y	pd.DataFrame \| None	Optional target values. When a Series, each element is the scalar target. When a DataFrame, each row provides multiple target values. When None, target is None for each observation.
**kwargs	`dict`	Extra keyword arguments forwarded to `stream.iter_array`.

Outputs

Output	Type	Description
Stream generator	`base.typing.Stream`	A generator yielding `(x: dict, y)` tuples, one per row of the DataFrame.

Usage Examples

import pandas as pd
from river import stream

X = pd.DataFrame({
    'x1': [1, 2, 3, 4],
    'x2': ['blue', 'yellow', 'yellow', 'blue'],
    'y': [True, False, False, True]
})
y = X.pop('y')

for xi, yi in stream.iter_pandas(X, y):
    print(xi, yi)
# {'x1': 1, 'x2': 'blue'} True
# {'x1': 2, 'x2': 'yellow'} False
# {'x1': 3, 'x2': 'yellow'} False
# {'x1': 4, 'x2': 'blue'} True

Unsupervised usage (no target):

import pandas as pd
from river import stream, cluster

X = pd.DataFrame({
    'feat_a': [1.0, 2.0, 3.0],
    'feat_b': [4.0, 5.0, 6.0]
})

model = cluster.KMeans(n_clusters=2, seed=42)

for x, _ in stream.iter_pandas(X):
    model.learn_one(x)
    label = model.predict_one(x)
    print(f'{x} -> cluster {label}')

Related Pages

Principle:Online_ml_River_DataFrame_Stream_Ingestion

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment