Workflow:Online ml River Binary Classification Pipeline

Knowledge Sources	River River Documentation River JMLR Paper
Domains	Online_ML, Classification, Streaming_Data
Last Updated	2026-02-08 16:00 GMT

Overview

End-to-end process for building and evaluating a binary classification model on streaming data using composable pipelines and progressive validation.

Description

This workflow covers the standard procedure for performing binary classification with River on data that arrives one observation at a time. It leverages the composable pipeline system to chain preprocessing transformers (e.g., feature scaling) with a classifier (e.g., logistic regression), then evaluates performance using progressive validation, which is the honest evaluation method for streaming data. The process covers dataset loading, pipeline construction, incremental learning, and real-time metric tracking.

Usage

Execute this workflow when you have a labeled binary classification dataset (or a live data stream producing binary labels) and want to train and evaluate a model incrementally, without storing the entire dataset in memory. This is the canonical starting point for any classification task with River.

Execution Steps

Step 1: Load or Connect to a Data Stream

Obtain a stream of observations, where each observation is a dictionary of features paired with a binary label. River provides built-in datasets (e.g., Phishing, Elec2, CreditCard) that yield (x, y) tuples one at a time. For custom data, use stream utilities to iterate over CSV files, pandas DataFrames, or SQL queries.

Key considerations:

Each observation is a Python dict of features and a target value
Built-in datasets are iterators that produce (dict, bool/int) tuples
For custom data, use river.stream.iter_csv, river.stream.iter_pandas, or river.stream.iter_sql
Data never needs to fit entirely in memory

Step 2: Construct a Preprocessing Pipeline

Build a composable pipeline by chaining one or more transformers before the classifier. Transformers handle tasks like feature scaling (StandardScaler, MinMaxScaler), encoding (OneHotEncoder), and feature extraction (PolynomialExtender). Use the pipe operator to connect transformers sequentially.

Key considerations:

Use the pipe operator (|) to chain transformers: scaler | encoder | model
Unsupervised transformers are updated during learn_one automatically
Pipelines implement the same learn_one/predict_one interface as individual models
Feature unions can be created with the plus operator (+)

Step 3: Select and Configure a Classifier

Choose an appropriate binary classifier. For linear boundaries, use LogisticRegression with an optimizer (SGD, Adam, etc.). For non-linear boundaries, use tree-based models (HoeffdingTreeClassifier) or ensemble methods (AdaBoostClassifier, BaggingClassifier). Configure hyperparameters such as learning rate, regularization, or tree growth parameters.

Key considerations:

LogisticRegression supports configurable optimizers and loss functions
Tree classifiers (HoeffdingTreeClassifier) grow incrementally using Hoeffding bounds
Ensemble methods (BaggingClassifier) use Poisson sampling for online bagging
All classifiers implement predict_one(x) and learn_one(x, y)

Step 4: Choose Evaluation Metrics

Select one or more streaming metrics to track model performance. Binary classification metrics include Accuracy, Precision, Recall, F1, ROCAUC, and LogLoss. Metrics are updated incrementally with each prediction-label pair. Multiple metrics can be combined.

Key considerations:

Metrics are updated one observation at a time via metric.update(y_true, y_pred)
Use ROCAUC when class probabilities matter
Rolling variants allow measuring performance over a sliding window
ClassificationReport provides a comprehensive multi-metric view

Step 5: Run Progressive Validation

Evaluate the model using progressive validation (prequential evaluation). For each observation: first predict, then update the metric, then train the model. This order prevents data leakage and simulates production conditions. Use evaluate.progressive_val_score for automated evaluation, or implement the loop manually for custom logic.

Key considerations:

The predict-evaluate-learn order prevents data leakage
progressive_val_score handles the entire loop with optional time measurement
iter_progressive_val_score yields intermediate results for plotting
Supports delayed label arrival via the moment and delay parameters

Step 6: Inspect Results and Iterate

Examine the final metric values and optionally plot learning curves over time. Adjust the pipeline by trying different preprocessing steps, classifiers, or hyperparameters. Compare multiple models using evaluation tracks or by running parallel progressive validations.

Key considerations:

Learning curves show how performance evolves over time
Model memory usage can be measured during evaluation
Use model_selection.BanditClassifier for automated online model comparison
Pipelines can be cloned and modified for systematic comparison

Execution Diagram

GitHub URL

Workflow Repository