Implementation:Online ml River Datasets Elec2
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs | Online Machine Learning, Concept Drift, Benchmark Datasets | 2026-02-08 16:00 GMT |
Overview
Concrete tool for loading non-stationary benchmark datasets (Elec2 and Insects) that exhibit concept drift, providing iterators suitable for evaluating drift-adaptive online learning models.
Description
The datasets.Elec2 class loads the Electricity pricing dataset from the Australian New South Wales Electricity Market. It contains 45,312 samples with 8 numerical features (date, day, period, nswprice, nswdemand, vicprice, vicdemand, transfer) and a binary target indicating whether the price went UP or DOWN. The dataset is downloaded automatically from a remote URL and cached locally.
The datasets.Insects class loads the Insects dataset, which provides multiple variants for concept drift evaluation. Each variant offers a different type of drift (abrupt, gradual, incremental, or combinations thereof) with 33 numerical features and 6 classes (24 for the out-of-control variant). The variant is selected at construction time. Both classes inherit from base.RemoteDataset and implement the standard River dataset iterator interface.
Usage
Import these datasets when you need a non-stationary data stream for evaluating drift-adaptive classifiers, drift detectors, or ensemble methods. They are commonly used with evaluate.progressive_val_score to benchmark models on concept drift scenarios.
Code Reference
Source Location
- Elec2:
river/datasets/elec2.py:L8-L51 - Insects:
river/datasets/insects.py:L8-L131
Signature
class Elec2(base.RemoteDataset):
def __init__(self) -> None
class Insects(base.RemoteDataset):
def __init__(self, variant: str = "abrupt_balanced")
Import
from river import datasets
Key Parameters
Elec2 takes no parameters.
Insects parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
variant |
str | "abrupt_balanced" | Which drift variant to load. Options: "abrupt_balanced", "abrupt_imbalanced", "gradual_balanced", "gradual_imbalanced", "incremental_abrupt_balanced", "incremental_reoccurring_balanced", "incremental_balanced" |
I/O Contract
Inputs
| Parameter | Type | Description |
|---|---|---|
| (constructor) | — | No input data needed at construction; datasets are downloaded on first iteration |
Outputs
| Output | Type | Description |
|---|---|---|
| Elec2 iterator | Iterator yielding (x: dict, y: bool) |
Each x is a dict of 8 float features; y is True (UP) or False (DOWN) |
| Insects iterator | Iterator yielding (x: dict, y: str) |
Each x is a dict of 33 features (f1..f33 as strings); y is the class label string |
Elec2 dataset properties: 45,312 samples, 8 features, binary classification (UP/DOWN).
Insects dataset properties: Varies by variant (24,150 to 355,275 samples), 33 features, 6-class classification.
Usage Examples
Loading the Elec2 Dataset
from river import datasets
dataset = datasets.Elec2()
for x, y in dataset:
print(x, y)
break
# x is a dict with keys: date, day, period, nswprice, nswdemand, vicprice, vicdemand, transfer
# y is True (UP) or False (DOWN)
Loading the Insects Dataset with a Specific Variant
from river import datasets
# Load the gradual drift variant
dataset = datasets.Insects(variant="gradual_balanced")
for x, y in dataset:
print(x, y)
break
# x is a dict with keys f1..f33
# y is a class label string
Using Elec2 with Progressive Validation
from river import datasets, evaluate, metrics, tree
dataset = datasets.Elec2().take(3000)
model = tree.HoeffdingAdaptiveTreeClassifier(seed=42)
metric = metrics.Accuracy()
evaluate.progressive_val_score(dataset, model, metric)
# Accuracy: ...