Principle:Online ml River Non Stationary Stream Loading
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Machine Learning, Concept Drift, Data Streams | 2026-02-08 16:00 GMT |
Overview
Non-stationary stream loading is the technique of loading benchmark datasets that exhibit concept drift, where the underlying data distribution changes over time.
Description
In many real-world streaming scenarios, the statistical properties of the data change over time -- a phenomenon known as concept drift. Evaluating drift-adaptive models requires datasets that contain such distributional shifts, either naturally or through controlled injection. Non-stationary stream loading provides access to curated benchmark datasets specifically designed for this purpose.
The Elec2 dataset captures electricity pricing data from the Australian New South Wales Electricity Market, where prices fluctuate based on supply and demand dynamics and transfers from the neighboring state of Victoria. The natural temporal dynamics introduce genuine concept drift that makes this dataset a widely-used benchmark for drift detection and adaptation research.
The Insects dataset offers a family of controlled drift variants -- including abrupt, gradual, incremental, and reoccurring drift patterns -- derived from sensor data classifying insect species. Each variant allows researchers to isolate and study specific types of distributional change, making it invaluable for systematic evaluation of drift-adaptive algorithms.
Usage
Use non-stationary stream loading when:
- You need to evaluate a drift-adaptive classifier or detector on realistic data with known distributional shifts.
- You want to benchmark the recovery speed and accuracy retention of a model after concept drift events.
- You need a reproducible dataset with well-characterized drift properties for comparing multiple approaches.
- You are studying different drift types (abrupt vs. gradual vs. incremental) and need controlled variants.
Theoretical Basis
Non-stationary data streams violate the standard i.i.d. assumption of traditional machine learning. Formally, given a stream of observations , the joint distribution changes over time:
This change can manifest in several forms:
- Real concept drift: The posterior changes while may remain the same.
- Virtual drift: The input distribution changes but remains constant.
- Abrupt drift: The distribution changes suddenly at a single point in time.
- Gradual drift: The old and new concepts alternate over a transition period, with increasing probability of the new concept.
- Incremental drift: The distribution slowly morphs from one concept to another through intermediate states.
The Elec2 dataset contains natural real concept drift due to market dynamics. The Insects dataset provides controlled variants:
| Variant | Drift Type | Samples |
|---|---|---|
| abrupt_balanced | Sudden distributional change | 52,848 |
| gradual_balanced | Gradual transition between concepts | 24,150 |
| incremental_balanced | Slow morphing between distributions | 57,018 |
| incremental_abrupt_balanced | Incremental followed by abrupt change | 79,986 |
| incremental_reoccurring_balanced | Incremental drift that reoccurs | 79,986 |
These datasets yield samples one at a time via an iterator interface, which is consistent with River's online learning paradigm where models process each observation exactly once in arrival order.