Implementation:Scikit learn contrib Imbalanced learn fetch datasets

Knowledge Sources	imbalanced-learn imbalanced-learn Docs
Domains	Machine_Learning, Benchmarking, Imbalanced_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

Concrete tool for downloading and caching benchmark imbalanced datasets from Zenodo provided by the imbalanced-learn library.

Description

The fetch_datasets function downloads a collection of 27 benchmark imbalanced datasets from Zenodo. Results are cached locally. Each dataset is returned as a Bunch object with .data, .target, and .DESCR attributes. Datasets can be filtered by name or ID.

Usage

Import this function when you need standardized imbalanced datasets for benchmarking resampling or classification methods.

Code Reference

Source Location

Repository: imbalanced-learn
File: imblearn/datasets/_zenodo.py
Lines: L111-301

Signature

def fetch_datasets(
    *,
    data_home=None,
    filter_data=None,
    download_if_missing=True,
    random_state=None,
    shuffle=False,
    verbose=False,
):
    """
    Args:
        data_home: str or None - Cache directory (default: ~/scikit_learn_data).
        filter_data: tuple of str/int or None - Dataset names or IDs to load.
        download_if_missing: bool - Auto-download if not cached (default: True).
        random_state: int, RandomState, or None - Shuffle seed.
        shuffle: bool - Shuffle data (default: False).
        verbose: bool - Print fetch info (default: False).
    Returns:
        OrderedDict of Bunch objects with .data, .target, .DESCR.
    """

Import

from imblearn.datasets import fetch_datasets

I/O Contract

Inputs

Name	Type	Required	Description
data_home	str or None	No	Cache directory path
filter_data	tuple of str/int or None	No	Dataset names or IDs to load
download_if_missing	bool	No	Download if not cached (default: True)

Outputs

Name	Type	Description
datasets	OrderedDict of Bunch	Keyed by dataset name; each Bunch has .data (ndarray), .target (ndarray), .DESCR (str)

Usage Examples

from imblearn.datasets import fetch_datasets

# Load specific datasets
datasets = fetch_datasets(filter_data=("ecoli", "satimage"))
for name, ds in datasets.items():
    print(f"{name}: {ds.data.shape}, imbalance ratio: ...")

# Load all datasets
all_datasets = fetch_datasets()

Related Pages

Implements Principle

Principle:Scikit_learn_contrib_Imbalanced_learn_Benchmark_Dataset_Loading

Requires Environment

Environment:Scikit_learn_contrib_Imbalanced_learn_Python_Scikit_learn

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment