Workflow:Online ml River Drift Adaptive Classification

Knowledge Sources	River River Documentation River JMLR Paper
Domains	Online_ML, Concept_Drift, Classification, Streaming_Data
Last Updated	2026-02-08 16:00 GMT

Overview

End-to-end process for building classifiers that detect and adapt to concept drift in non-stationary data streams using drift detectors and automatic model retraining.

Description

This workflow addresses the problem of concept drift, where the statistical properties of the target variable change over time, causing model performance to degrade. It combines a base classifier with a drift detection algorithm that monitors prediction errors for significant distributional changes. When drift is detected, the model is retrained or replaced with a background model that has been training on recent data. The process covers drift detector selection, model wrapping, background training, and evaluation on non-stationary streams.

Usage

Execute this workflow when the data-generating process is expected to change over time, such as in electricity price prediction, spam filtering, or user behavior modeling. Use it when a standard online learner shows performance degradation on evolving data, or when the application domain inherently involves non-stationary distributions.

Execution Steps

Step 1: Load a Non-stationary Data Stream

Obtain a data stream where the underlying distribution changes over time. River provides datasets with known drift characteristics (Elec2, Insects with drift variants). Synthetic generators with injected drift points can also be used for controlled experimentation. Identify or hypothesize the nature of drift (sudden, gradual, recurring).

Key considerations:

Elec2 dataset contains natural concept drift in electricity pricing
Insects dataset provides variants: abrupt, gradual, and incremental drift
Synthetic generators allow controlled drift injection for testing
Real-world streams often exhibit multiple types of drift simultaneously

Step 2: Select a Base Classifier

Choose a classifier that will serve as the primary learner. Any River classifier works, but tree-based models (HoeffdingTreeClassifier, HoeffdingAdaptiveTreeClassifier) are common choices because they can grow and adapt their structure. Alternatively, use pipeline-based classifiers with preprocessing stages.

Key considerations:

HoeffdingAdaptiveTreeClassifier has built-in drift adaptation via ADWIN at each node
Simpler models like Naive Bayes recover faster after drift
Pipeline models (scaler | classifier) work as base classifiers too
The base model will be cloned for background training

Step 3: Configure a Drift Detector

Select a drift detection algorithm that monitors a binary error stream (correct/incorrect predictions). DDM (Drift Detection Method) uses the binomial distribution to detect changes in error rate. ADWIN uses adaptive windowing with theoretical guarantees. KSWIN uses non-parametric Kolmogorov-Smirnov testing. PageHinkley uses cumulative sum control charts.

Key considerations:

DDM provides both warning and drift signals, enabling background model training
ADWIN dynamically adjusts its window size based on observed change rate
KSWIN is non-parametric and makes no distributional assumptions
PageHinkley is effective for detecting gradual changes in mean

Step 4: Wrap the Classifier with Drift Retraining

Combine the base classifier and drift detector using DriftRetrainingClassifier. This wrapper monitors prediction errors and manages the model lifecycle. When a warning is detected, a background model clone begins training. When drift is confirmed, the background model replaces the primary model. This provides seamless adaptation without manual intervention.

Key considerations:

train_in_background=True enables proactive background model training during warning phase
When drift is detected without background training, the model is simply reset
The background model trains only on data arriving after the warning signal
Multiple successive drifts are handled automatically

Step 5: Evaluate with Progressive Validation

Run progressive validation on the non-stationary stream. The drift-adaptive model handles prediction, evaluation, and learning in the standard predict-evaluate-learn loop. Track metrics over time using iter_progressive_val_score to observe performance drops at drift points and recovery after adaptation.

Key considerations:

Use iter_progressive_val_score to observe performance over time
Performance dips at drift points are expected; monitor recovery speed
Compare against a non-adaptive baseline to quantify drift handling benefit
Rolling metrics provide a windowed view of recent performance

Step 6: Analyze Drift Points and Model Behavior

Examine when and how often drift was detected. Access the drift detector state to understand the timing and frequency of adaptations. Tune detector sensitivity (warning and drift thresholds) to balance between false alarms and missed drifts. Optionally combine multiple drift detectors for more robust detection.

Key considerations:

Overly sensitive detectors cause unnecessary retraining (false alarms)
Insensitive detectors miss real drifts, prolonging performance degradation
Ensemble methods like ADWINBaggingClassifier and SRPClassifier have built-in per-member drift handling
Adaptive Random Forest (ARFClassifier) handles drift at the tree level

Execution Diagram

GitHub URL

Workflow Repository