Implementation:Scikit learn contrib Imbalanced learn fetch datasets
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Benchmarking, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
Concrete tool for downloading and caching benchmark imbalanced datasets from Zenodo provided by the imbalanced-learn library.
Description
The fetch_datasets function downloads a collection of 27 benchmark imbalanced datasets from Zenodo. Results are cached locally. Each dataset is returned as a Bunch object with .data, .target, and .DESCR attributes. Datasets can be filtered by name or ID.
Usage
Import this function when you need standardized imbalanced datasets for benchmarking resampling or classification methods.
Code Reference
Source Location
- Repository: imbalanced-learn
- File: imblearn/datasets/_zenodo.py
- Lines: L111-301
Signature
def fetch_datasets(
*,
data_home=None,
filter_data=None,
download_if_missing=True,
random_state=None,
shuffle=False,
verbose=False,
):
"""
Args:
data_home: str or None - Cache directory (default: ~/scikit_learn_data).
filter_data: tuple of str/int or None - Dataset names or IDs to load.
download_if_missing: bool - Auto-download if not cached (default: True).
random_state: int, RandomState, or None - Shuffle seed.
shuffle: bool - Shuffle data (default: False).
verbose: bool - Print fetch info (default: False).
Returns:
OrderedDict of Bunch objects with .data, .target, .DESCR.
"""
Import
from imblearn.datasets import fetch_datasets
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| data_home | str or None | No | Cache directory path |
| filter_data | tuple of str/int or None | No | Dataset names or IDs to load |
| download_if_missing | bool | No | Download if not cached (default: True) |
Outputs
| Name | Type | Description |
|---|---|---|
| datasets | OrderedDict of Bunch | Keyed by dataset name; each Bunch has .data (ndarray), .target (ndarray), .DESCR (str) |
Usage Examples
from imblearn.datasets import fetch_datasets
# Load specific datasets
datasets = fetch_datasets(filter_data=("ecoli", "satimage"))
for name, ds in datasets.items():
print(f"{name}: {ds.data.shape}, imbalance ratio: ...")
# Load all datasets
all_datasets = fetch_datasets()
Related Pages
Implements Principle
Requires Environment
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment