Implementation:Online ml River Stream Iter Pandas
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Machine Learning, Data Streaming, Pandas Integration | 2026-02-08 16:00 GMT |
Overview
Concrete tool for converting a pandas DataFrame into a generator of (dict, target) tuples suitable for River's observation-by-observation online learning API.
Description
The stream.iter_pandas function takes a pandas DataFrame of features and an optional pandas Series (or DataFrame) of targets and yields one observation at a time. Each observation is a tuple (x, y) where x is a dictionary mapping column names to feature values and y is the corresponding target (or None if no target is provided). Internally, the function converts the DataFrame to a NumPy array and delegates to stream.iter_array, passing the original column names as feature names and, for multi-target scenarios, the target column names.
Usage
Import stream.iter_pandas whenever you need to iterate over a pandas DataFrame in a streaming fashion for use with any River estimator. This is the standard entry point for converting batch tabular data into a stream.
Code Reference
Source Location
river/stream/iter_pandas.py:L8-L48
Signature
def iter_pandas(
X: pd.DataFrame,
y: pd.Series | pd.DataFrame | None = None,
**kwargs
) -> base.typing.Stream
Import
from river import stream
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| X | pd.DataFrame |
A DataFrame of features. Each column becomes a key in the yielded dictionaries. |
| y | pd.DataFrame | None | Optional target values. When a Series, each element is the scalar target. When a DataFrame, each row provides multiple target values. When None, target is None for each observation. |
| **kwargs | dict |
Extra keyword arguments forwarded to stream.iter_array.
|
Outputs
| Output | Type | Description |
|---|---|---|
| Stream generator | base.typing.Stream |
A generator yielding (x: dict, y) tuples, one per row of the DataFrame.
|
Usage Examples
import pandas as pd
from river import stream
X = pd.DataFrame({
'x1': [1, 2, 3, 4],
'x2': ['blue', 'yellow', 'yellow', 'blue'],
'y': [True, False, False, True]
})
y = X.pop('y')
for xi, yi in stream.iter_pandas(X, y):
print(xi, yi)
# {'x1': 1, 'x2': 'blue'} True
# {'x1': 2, 'x2': 'yellow'} False
# {'x1': 3, 'x2': 'yellow'} False
# {'x1': 4, 'x2': 'blue'} True
Unsupervised usage (no target):
import pandas as pd
from river import stream, cluster
X = pd.DataFrame({
'feat_a': [1.0, 2.0, 3.0],
'feat_b': [4.0, 5.0, 6.0]
})
model = cluster.KMeans(n_clusters=2, seed=42)
for x, _ in stream.iter_pandas(X):
model.learn_one(x)
label = model.predict_one(x)
print(f'{x} -> cluster {label}')