Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Compat SklearnToRiver

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Scikit_Learn, Compatibility, Machine_Learning
Last Updated 2026-02-08 16:00 GMT

Overview

Compatibility wrappers that adapt scikit-learn estimators with partial_fit methods to River's online learning interface.

Description

The sklearn-to-river compatibility layer enables the use of scikit-learn models that support incremental learning (via partial_fit) within River's streaming framework. The module provides two main wrapper classes: SKL2RiverRegressor and SKL2RiverClassifier.

These wrappers translate between River's one-sample-at-a-time interface (learn_one, predict_one) and scikit-learn's batch interface (partial_fit, predict). The wrappers maintain feature name ordering to ensure consistent feature alignment across calls, and handle the conversion between dictionaries (River's format) and arrays or DataFrames (scikit-learn's format).

For classifiers, the classes parameter is required and must be provided during initialization, as scikit-learn classifiers need to know all possible classes upfront. The wrappers also implement both mini-batch methods (learn_many, predict_many) for efficient processing when multiple samples are available.

Usage

Use these wrappers when you want to leverage scikit-learn's incremental learning algorithms (like SGDRegressor or SGDClassifier) within River pipelines or streaming workflows. This allows you to combine scikit-learn's mature implementations with River's streaming-first design and other transformers.

Code Reference

Source Location

Signature

def convert_sklearn_to_river(estimator: sklearn_base.BaseEstimator, classes: list | None = None):
    ...

class SKL2RiverRegressor(SKL2RiverBase, base.Regressor):
    def __init__(self, estimator: sklearn_base.BaseEstimator):
        ...

class SKL2RiverClassifier(SKL2RiverBase, base.Classifier):
    def __init__(self, estimator: sklearn_base.ClassifierMixin, classes: list):
        ...

Import

from river import compat

I/O Contract

Input (convert_sklearn_to_river)
Parameter Type Description
estimator sklearn_base.BaseEstimator scikit-learn estimator with partial_fit method
classes list or None Required for classifiers, list of all possible classes
Output
Method Return Type Description
convert_sklearn_to_river(estimator, classes) River wrapper Returns River-compatible wrapper
Common Methods (all wrappers)
Method Parameters Return Type Description
learn_one(x, y) x: dict, y: target None Updates model with single sample
predict_one(x) x: dict float or class Predicts single sample
learn_many(X, y) X: DataFrame, y: Series None Updates model with batch
predict_many(X) X: DataFrame Series Predicts batch of samples
Additional Methods (SKL2RiverClassifier)
Method Parameters Return Type Description
predict_proba_one(x) x: dict dict Returns class probabilities for single sample
predict_proba_many(X) X: DataFrame DataFrame Returns class probabilities for batch

Usage Examples

from river import compat
from river import evaluate
from river import metrics
from river import preprocessing
from river import stream
from sklearn import linear_model
from sklearn import datasets

# Convert sklearn regressor to River
dataset = stream.iter_sklearn_dataset(
    dataset=datasets.load_diabetes(),
    shuffle=True,
    seed=42
)

scaler = preprocessing.StandardScaler()
sgd_reg = compat.convert_sklearn_to_river(linear_model.SGDRegressor())
model = scaler | sgd_reg

metric = metrics.MAE()

evaluate.progressive_val_score(dataset, model, metric)
# MAE: 84.501421

# Example with classifier
dataset = stream.iter_sklearn_dataset(
    dataset=datasets.load_breast_cancer(),
    shuffle=True,
    seed=42
)

model = preprocessing.StandardScaler()
model |= compat.convert_sklearn_to_river(
    estimator=linear_model.SGDClassifier(
        loss='log_loss',
        eta0=0.01,
        learning_rate='constant'
    ),
    classes=[False, True]
)

metric = metrics.LogLoss()

evaluate.progressive_val_score(dataset, model, metric)
# LogLoss: 0.198029

# Use in River pipeline
from river import compose

pipeline = (
    preprocessing.StandardScaler() |
    compat.convert_sklearn_to_river(
        linear_model.SGDRegressor(max_iter=1),
    )
)

# Learn one sample at a time
for x, y in dataset:
    y_pred = pipeline.predict_one(x)
    pipeline.learn_one(x, y)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment