Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Online ml River Non Stationary Stream Loading

From Leeroopedia


Knowledge Sources Domains Last Updated
River River Docs Online Machine Learning, Concept Drift, Data Streams 2026-02-08 16:00 GMT

Overview

Non-stationary stream loading is the technique of loading benchmark datasets that exhibit concept drift, where the underlying data distribution changes over time.

Description

In many real-world streaming scenarios, the statistical properties of the data change over time -- a phenomenon known as concept drift. Evaluating drift-adaptive models requires datasets that contain such distributional shifts, either naturally or through controlled injection. Non-stationary stream loading provides access to curated benchmark datasets specifically designed for this purpose.

The Elec2 dataset captures electricity pricing data from the Australian New South Wales Electricity Market, where prices fluctuate based on supply and demand dynamics and transfers from the neighboring state of Victoria. The natural temporal dynamics introduce genuine concept drift that makes this dataset a widely-used benchmark for drift detection and adaptation research.

The Insects dataset offers a family of controlled drift variants -- including abrupt, gradual, incremental, and reoccurring drift patterns -- derived from sensor data classifying insect species. Each variant allows researchers to isolate and study specific types of distributional change, making it invaluable for systematic evaluation of drift-adaptive algorithms.

Usage

Use non-stationary stream loading when:

  • You need to evaluate a drift-adaptive classifier or detector on realistic data with known distributional shifts.
  • You want to benchmark the recovery speed and accuracy retention of a model after concept drift events.
  • You need a reproducible dataset with well-characterized drift properties for comparing multiple approaches.
  • You are studying different drift types (abrupt vs. gradual vs. incremental) and need controlled variants.

Theoretical Basis

Non-stationary data streams violate the standard i.i.d. assumption of traditional machine learning. Formally, given a stream of observations (xt,yt), the joint distribution Pt(X,Y) changes over time:

t0,t1 such that Pt0(X,Y)Pt1(X,Y)

This change can manifest in several forms:

  • Real concept drift: The posterior P(Y|X) changes while P(X) may remain the same.
  • Virtual drift: The input distribution P(X) changes but P(Y|X) remains constant.
  • Abrupt drift: The distribution changes suddenly at a single point in time.
  • Gradual drift: The old and new concepts alternate over a transition period, with increasing probability of the new concept.
  • Incremental drift: The distribution slowly morphs from one concept to another through intermediate states.

The Elec2 dataset contains natural real concept drift due to market dynamics. The Insects dataset provides controlled variants:

Variant Drift Type Samples
abrupt_balanced Sudden distributional change 52,848
gradual_balanced Gradual transition between concepts 24,150
incremental_balanced Slow morphing between distributions 57,018
incremental_abrupt_balanced Incremental followed by abrupt change 79,986
incremental_reoccurring_balanced Incremental drift that reoccurs 79,986

These datasets yield samples one at a time via an iterator interface, which is consistent with River's online learning paradigm where models process each observation exactly once in arrival order.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment