Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn contrib Imbalanced learn fetch datasets

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Benchmarking, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

Concrete tool for downloading and caching benchmark imbalanced datasets from Zenodo provided by the imbalanced-learn library.

Description

The fetch_datasets function downloads a collection of 27 benchmark imbalanced datasets from Zenodo. Results are cached locally. Each dataset is returned as a Bunch object with .data, .target, and .DESCR attributes. Datasets can be filtered by name or ID.

Usage

Import this function when you need standardized imbalanced datasets for benchmarking resampling or classification methods.

Code Reference

Source Location

Signature

def fetch_datasets(
    *,
    data_home=None,
    filter_data=None,
    download_if_missing=True,
    random_state=None,
    shuffle=False,
    verbose=False,
):
    """
    Args:
        data_home: str or None - Cache directory (default: ~/scikit_learn_data).
        filter_data: tuple of str/int or None - Dataset names or IDs to load.
        download_if_missing: bool - Auto-download if not cached (default: True).
        random_state: int, RandomState, or None - Shuffle seed.
        shuffle: bool - Shuffle data (default: False).
        verbose: bool - Print fetch info (default: False).
    Returns:
        OrderedDict of Bunch objects with .data, .target, .DESCR.
    """

Import

from imblearn.datasets import fetch_datasets

I/O Contract

Inputs

Name Type Required Description
data_home str or None No Cache directory path
filter_data tuple of str/int or None No Dataset names or IDs to load
download_if_missing bool No Download if not cached (default: True)

Outputs

Name Type Description
datasets OrderedDict of Bunch Keyed by dataset name; each Bunch has .data (ndarray), .target (ndarray), .DESCR (str)

Usage Examples

from imblearn.datasets import fetch_datasets

# Load specific datasets
datasets = fetch_datasets(filter_data=("ecoli", "satimage"))
for name, ds in datasets.items():
    print(f"{name}: {ds.data.shape}, imbalance ratio: ...")

# Load all datasets
all_datasets = fetch_datasets()

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment