Implementation:Online ml River Stream Iter Vaex
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Data_Streaming, Big_Data |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Yields rows from Vaex DataFrames, a library optimized for large-scale tabular datasets.
Description
The iter_vaex function enables streaming over Vaex DataFrames, which are designed for out-of-core computing on billion-row datasets. It converts Vaex DataFrame rows into River's dictionary format, supporting multi-output scenarios. This integration allows online learning on datasets too large to fit in memory.
Usage
Use this when working with very large datasets stored in Vaex DataFrames. Vaex is particularly useful for datasets that don't fit in RAM, as it uses memory-mapped files and lazy evaluation. This function bridges Vaex's big data capabilities with River's online learning.
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/stream/iter_vaex.py
Signature
def iter_vaex(
X: vaex.dataframe.DataFrame,
y: str | vaex.expression.Expression | None = None,
features: list[str] | vaex.expression.Expression | None = None,
) -> base.typing.Stream:
...
Import
from river import stream
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| X | vaex.dataframe.DataFrame | Vaex DataFrame with training features |
| y | str, Expression, or None | Column or expression for target variable |
| features | list[str], Expression, or None | Features to use (all columns if None, excluding y) |
Returns:
| Type | Description |
|---|---|
| Iterator[(dict, Any)] | Stream of (features dict, target) tuples |
Usage Examples
import vaex
from river import stream, linear_model, metrics
# Create a Vaex DataFrame
df = vaex.from_arrays(
x1=[1, 2, 3, 4, 5],
x2=[10, 20, 30, 40, 50],
target=[1.5, 2.5, 3.5, 4.5, 5.5]
)
# Stream all columns
for x, y in stream.iter_vaex(df, y='target'):
print(f"Features: {x}, Target: {y}")
# Specify which features to use
for x, y in stream.iter_vaex(df, y='target', features=['x1']):
print(f"Using only x1: {x}, Target: {y}")
# Example with online training
model = linear_model.LinearRegression()
metric = metrics.MAE()
df_train = vaex.from_arrays(
feature1=[1.0, 2.0, 3.0, 4.0, 5.0] * 100,
feature2=[0.5, 1.0, 1.5, 2.0, 2.5] * 100,
y=[2.0, 4.0, 6.0, 8.0, 10.0] * 100
)
print("\nOnline training with Vaex:")
for x, y in stream.iter_vaex(df_train, y='y'):
y_pred = model.predict_one(x)
metric.update(y, y_pred)
model.learn_one(x, y)
print(f"Final MAE: {metric.get():.4f}")
# Multi-output example
df_multi = vaex.from_arrays(
x=[1, 2, 3],
y1=[0.1, 0.2, 0.3],
y2=[0.4, 0.5, 0.6]
)
for x, y_dict in stream.iter_vaex(df_multi, y=['y1', 'y2']):
print(f"Features: {x}, Targets: {y_dict}")