Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Online ml River Datasets Insects

From Leeroopedia


Knowledge Sources
Domains Online_Learning, Datasets, Multi_Class_Classification, Concept_Drift
Last Updated 2026-02-08 16:00 GMT

Overview

Concrete dataset for multi-class classification with concept drift provided by the River library.

Description

The Insects dataset has different variants specifically designed for concept drift evaluation. The number of samples and the difficulty change from one variant to another. The number of classes is always 6, except for the "out-of-control" variant which has 24 classes.

Available variants:

  • abrupt_balanced
  • abrupt_imbalanced
  • gradual_balanced
  • gradual_imbalanced
  • incremental_abrupt_balanced
  • incremental_reoccurring_balanced
  • incremental_balanced

The default variant is "abrupt_balanced" with 52,848 samples and 33 features.

Usage

This dataset is useful for:

  • Evaluating concept drift detection algorithms
  • Testing stream learning algorithms on non-stationary data
  • Comparing performance on balanced vs imbalanced scenarios
  • Benchmarking classification algorithms with different drift patterns

Code Reference

Source Location

Signature

class Insects(base.RemoteDataset):
    def __init__(self, variant="abrupt_balanced"):
        if variant not in self.variant_configs:
            variants = "\n".join(f"- {v}" for v in self.variant_configs)
            raise ValueError(f"Unknown variant, possible choices are:\n{variants}")

        config = self.variant_configs[variant]
        n_samples = config["n_samples"]
        size = config["size"]
        url = config["url"]
        filename = config["filename"]
        n_classes = 24 if variant == "out-of-control" else 6

        super().__init__(
            n_classes=n_classes,
            n_samples=n_samples,
            n_features=33,
            task=base.MULTI_CLF,
            url=url,
            size=size,
            unpack=False,
            filename=filename,
        )
        self.variant = variant

    def _iter(self):
        cols = [f"f{i}" for i in range(1, 34)] + ["class"]
        return stream.iter_csv(self.path, target="class", fieldnames=cols)

Import

from river import datasets
dataset = datasets.Insects()  # default: abrupt_balanced
# Or specify a variant:
dataset = datasets.Insects(variant="gradual_balanced")

I/O Contract

Inputs

Name Type Required Description
variant str No Variant name (default: "abrupt_balanced")

Outputs

Name Type Description
iter() tuple(dict, str) Yields (features_dict, target) pairs where features are f1-f33

Dataset Properties

Property Value (default variant)
Number of samples 52,848 (varies by variant)
Number of features 33
Number of classes 6 (24 for "out-of-control")
Task Multi-class classification
Format CSV

Variants

Variant Samples Size (bytes) Description
abrupt_balanced 52,848 14,151,769 Balanced classes with abrupt drift
abrupt_imbalanced 355,275 94,893,622 Imbalanced classes with abrupt drift
gradual_balanced 24,150 6,474,831 Balanced classes with gradual drift
gradual_imbalanced 143,323 38,339,554 Imbalanced classes with gradual drift
incremental_abrupt_balanced 79,986 21,421,452 Balanced with incremental abrupt drift
incremental_reoccurring_balanced 79,986 21,433,047 Balanced with reoccurring drift
incremental_balanced 57,018 15,258,997 Balanced with incremental drift

Usage Examples

from river import datasets

# Use default variant
dataset = datasets.Insects()
for x, y in dataset:
    print(x, y)
    break

# Use specific variant for gradual drift
dataset = datasets.Insects(variant="gradual_balanced")
for x, y in dataset:
    print(x, y)
    break

References

  • USP DS repository
  • Souza, V., Reis, D.M.D., Maletzke, A.G. and Batista, G.E., 2020. Challenges in Benchmarking Stream Learning Algorithms with Real-world Data. arXiv preprint arXiv:2005.00113. [1]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment