Principle:Online ml River Sklearn Compatibility
| Knowledge Sources | Machine Learning Software Engineering |
|---|---|
| Domains | Online_Learning Software_Design Interoperability |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
Framework interoperability wrappers provide bidirectional compatibility between online (incremental) and batch machine learning APIs. They allow online models to be used within batch-oriented ecosystems (e.g., scikit-learn's cross-validation, grid search) and batch models to be used within streaming pipelines, bridging two fundamentally different learning paradigms.
Description
Online and batch ML frameworks differ in their core abstractions:
- Batch frameworks (e.g., scikit-learn) expect
fit(X, y)on a complete dataset andpredict(X)on arrays. They support cross-validation, grid search, and other tools that assume the full dataset is available. - Online frameworks (e.g., River) expect
learn_one(x, y)on individual instances andpredict_one(x)on single dictionaries. They are designed for streaming, incremental updates.
Interoperability wrappers bridge this gap in both directions:
Online-to-batch (River-to-sklearn):
- Wraps an online model to expose scikit-learn's
fit/predictAPI. fit(X, y)iterates through the dataset callinglearn_onefor each instance.predict(X)callspredict_onefor each instance and collects results into an array.- Enables use of scikit-learn's cross-validation, pipeline, and hyperparameter tuning tools.
Batch-to-online (sklearn-to-River):
- Wraps a scikit-learn model (which supports
partial_fit) to expose River'slearn_one/predict_oneAPI. learn_one(x, y)callspartial_fiton a single-instance array.predict_one(x)callspredicton a single-instance array and returns a scalar.- Enables use of scikit-learn's incremental learners within River's streaming pipelines.
Usage
Use framework interoperability wrappers when:
- You want to use an online model with scikit-learn's evaluation tools.
- You want to include a scikit-learn incremental learner in a River pipeline.
- You need to benchmark online models against batch baselines.
- You are migrating between frameworks and need transitional compatibility.
Theoretical Basis
Adapter pattern: Interoperability wrappers implement the adapter design pattern, translating one interface into another:
Online-to-Batch Adapter:
fit(X, y):
for (x_i, y_i) in zip(X, y):
online_model.learn_one(dict(x_i), y_i)
return self
predict(X):
return [online_model.predict_one(dict(x_i)) for x_i in X]
Batch-to-Online Adapter:
learn_one(x, y):
batch_model.partial_fit([list(x.values())], [y])
return self
predict_one(x):
return batch_model.predict([list(x.values())])[0]
Data format translation: Online models use dictionaries ({"feature_a": 1.0, "feature_b": 2.0}) while batch models use arrays/DataFrames. Wrappers handle this conversion, including maintaining consistent feature ordering.
Partial fit requirement: The batch-to-online direction requires that the scikit-learn model supports partial_fit, which limits compatibility to incremental learners (e.g., SGDClassifier, MiniBatchKMeans). Models without partial_fit cannot be meaningfully wrapped for streaming use.