Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Stream Iter Vaex

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Data_Streaming, Big_Data
Last Updated 2026-02-08 16:00 GMT

Overview

Yields rows from Vaex DataFrames, a library optimized for large-scale tabular datasets.

Description

The iter_vaex function enables streaming over Vaex DataFrames, which are designed for out-of-core computing on billion-row datasets. It converts Vaex DataFrame rows into River's dictionary format, supporting multi-output scenarios. This integration allows online learning on datasets too large to fit in memory.

Usage

Use this when working with very large datasets stored in Vaex DataFrames. Vaex is particularly useful for datasets that don't fit in RAM, as it uses memory-mapped files and lazy evaluation. This function bridges Vaex's big data capabilities with River's online learning.

Code Reference

Source Location

Signature

def iter_vaex(
    X: vaex.dataframe.DataFrame,
    y: str | vaex.expression.Expression | None = None,
    features: list[str] | vaex.expression.Expression | None = None,
) -> base.typing.Stream:
    ...

Import

from river import stream

I/O Contract

Parameter Type Description
X vaex.dataframe.DataFrame Vaex DataFrame with training features
y str, Expression, or None Column or expression for target variable
features list[str], Expression, or None Features to use (all columns if None, excluding y)

Returns:

Type Description
Iterator[(dict, Any)] Stream of (features dict, target) tuples

Usage Examples

import vaex
from river import stream, linear_model, metrics

# Create a Vaex DataFrame
df = vaex.from_arrays(
    x1=[1, 2, 3, 4, 5],
    x2=[10, 20, 30, 40, 50],
    target=[1.5, 2.5, 3.5, 4.5, 5.5]
)

# Stream all columns
for x, y in stream.iter_vaex(df, y='target'):
    print(f"Features: {x}, Target: {y}")

# Specify which features to use
for x, y in stream.iter_vaex(df, y='target', features=['x1']):
    print(f"Using only x1: {x}, Target: {y}")

# Example with online training
model = linear_model.LinearRegression()
metric = metrics.MAE()

df_train = vaex.from_arrays(
    feature1=[1.0, 2.0, 3.0, 4.0, 5.0] * 100,
    feature2=[0.5, 1.0, 1.5, 2.0, 2.5] * 100,
    y=[2.0, 4.0, 6.0, 8.0, 10.0] * 100
)

print("\nOnline training with Vaex:")
for x, y in stream.iter_vaex(df_train, y='y'):
    y_pred = model.predict_one(x)
    metric.update(y, y_pred)
    model.learn_one(x, y)

print(f"Final MAE: {metric.get():.4f}")

# Multi-output example
df_multi = vaex.from_arrays(
    x=[1, 2, 3],
    y1=[0.1, 0.2, 0.3],
    y2=[0.4, 0.5, 0.6]
)

for x, y_dict in stream.iter_vaex(df_multi, y=['y1', 'y2']):
    print(f"Features: {x}, Targets: {y_dict}")

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment