Implementation:Online ml River Datasets Insects
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Online_Learning, Datasets, Multi_Class_Classification, Concept_Drift |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Concrete dataset for multi-class classification with concept drift provided by the River library.
Description
The Insects dataset has different variants specifically designed for concept drift evaluation. The number of samples and the difficulty change from one variant to another. The number of classes is always 6, except for the "out-of-control" variant which has 24 classes.
Available variants:
- abrupt_balanced
- abrupt_imbalanced
- gradual_balanced
- gradual_imbalanced
- incremental_abrupt_balanced
- incremental_reoccurring_balanced
- incremental_balanced
The default variant is "abrupt_balanced" with 52,848 samples and 33 features.
Usage
This dataset is useful for:
- Evaluating concept drift detection algorithms
- Testing stream learning algorithms on non-stationary data
- Comparing performance on balanced vs imbalanced scenarios
- Benchmarking classification algorithms with different drift patterns
Code Reference
Source Location
- Repository: Online_ml_River
- File: river/datasets/insects.py
Signature
class Insects(base.RemoteDataset):
def __init__(self, variant="abrupt_balanced"):
if variant not in self.variant_configs:
variants = "\n".join(f"- {v}" for v in self.variant_configs)
raise ValueError(f"Unknown variant, possible choices are:\n{variants}")
config = self.variant_configs[variant]
n_samples = config["n_samples"]
size = config["size"]
url = config["url"]
filename = config["filename"]
n_classes = 24 if variant == "out-of-control" else 6
super().__init__(
n_classes=n_classes,
n_samples=n_samples,
n_features=33,
task=base.MULTI_CLF,
url=url,
size=size,
unpack=False,
filename=filename,
)
self.variant = variant
def _iter(self):
cols = [f"f{i}" for i in range(1, 34)] + ["class"]
return stream.iter_csv(self.path, target="class", fieldnames=cols)
Import
from river import datasets
dataset = datasets.Insects() # default: abrupt_balanced
# Or specify a variant:
dataset = datasets.Insects(variant="gradual_balanced")
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| variant | str | No | Variant name (default: "abrupt_balanced") |
Outputs
| Name | Type | Description |
|---|---|---|
| iter() | tuple(dict, str) | Yields (features_dict, target) pairs where features are f1-f33 |
Dataset Properties
| Property | Value (default variant) |
|---|---|
| Number of samples | 52,848 (varies by variant) |
| Number of features | 33 |
| Number of classes | 6 (24 for "out-of-control") |
| Task | Multi-class classification |
| Format | CSV |
Variants
| Variant | Samples | Size (bytes) | Description |
|---|---|---|---|
| abrupt_balanced | 52,848 | 14,151,769 | Balanced classes with abrupt drift |
| abrupt_imbalanced | 355,275 | 94,893,622 | Imbalanced classes with abrupt drift |
| gradual_balanced | 24,150 | 6,474,831 | Balanced classes with gradual drift |
| gradual_imbalanced | 143,323 | 38,339,554 | Imbalanced classes with gradual drift |
| incremental_abrupt_balanced | 79,986 | 21,421,452 | Balanced with incremental abrupt drift |
| incremental_reoccurring_balanced | 79,986 | 21,433,047 | Balanced with reoccurring drift |
| incremental_balanced | 57,018 | 15,258,997 | Balanced with incremental drift |
Usage Examples
from river import datasets
# Use default variant
dataset = datasets.Insects()
for x, y in dataset:
print(x, y)
break
# Use specific variant for gradual drift
dataset = datasets.Insects(variant="gradual_balanced")
for x, y in dataset:
print(x, y)
break
References
- USP DS repository
- Souza, V., Reis, D.M.D., Maletzke, A.G. and Batista, G.E., 2020. Challenges in Benchmarking Stream Learning Algorithms with Real-world Data. arXiv preprint arXiv:2005.00113. [1]
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment