Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Heuristic:Scikit learn contrib Imbalanced learn Sampling Strategy Selection

From Leeroopedia



Knowledge Sources
Domains Imbalanced_Classification, Parameter_Selection
Last Updated 2026-02-09 03:00 GMT

Overview

The `sampling_strategy` parameter accepts strings, dicts, floats, lists, or callables, each with type-specific validation rules that depend on the sampling method.

Description

The `sampling_strategy` parameter is the primary control for how resampling targets are computed in imbalanced-learn. Its behavior varies significantly based on its type (string, dict, float, list, callable) and the sampling context (over-sampling, under-sampling, or clean-sampling). Mismatched types cause `ValueError` exceptions with specific error messages. Understanding the type constraints and their interactions with sampling methods prevents common configuration errors.

Usage

Apply this heuristic when configuring any sampler's `sampling_strategy` parameter. Especially important when switching between over-sampling and under-sampling methods, as valid string values differ between them.

The Insight (Rule of Thumb)

  • Action: Match `sampling_strategy` type to sampling method:
    • Strings: `"auto"` maps to `"not majority"` for over-sampling, `"not minority"` for under-sampling
    • Float: Only valid for binary classification; must be in range (0, 1]
    • Dict: Keys must be existing class labels; values must be >= current count (over-sampling) or <= current count (under-sampling)
    • List: Only valid for clean-sampling methods (e.g., `TomekLinks`, `EditedNearestNeighbours`)
    • Callable: Must accept `y` and return a dict
  • Value: Default `"auto"` is correct for most use cases
  • Trade-off: More specific strategies (dict, float) give finer control but require knowledge of dataset class distribution

Reasoning

The validation logic in `imblearn/utils/_validation.py` enforces strict type-method compatibility. Key constraints:

  • Float strategy enforces binary classification because the ratio is defined between exactly two classes
  • Over-sampling dict values must be >= current counts (you cannot over-sample to fewer samples)
  • Under-sampling dict values must be <= current counts (you cannot under-sample to more samples)
  • String `"majority"` is invalid for over-sampling (cannot over-sample the majority class)
  • String `"minority"` is invalid for under-sampling (cannot under-sample the minority class)

Code Evidence

Float validation from `imblearn/utils/_validation.py:559-564`:

elif isinstance(sampling_strategy, Real):
    if sampling_strategy <= 0 or sampling_strategy > 1:
        raise ValueError(
            "When 'sampling_strategy' is a float, it should be "
            f"in the range (0, 1]. Got {sampling_strategy} instead."
        )

Single-class rejection from `imblearn/utils/_validation.py:532-536`:

if np.unique(y).size <= 1:
    raise ValueError(
        "The target 'y' needs to have more than 1 class. "
        f"Got {np.unique(y).size} class instead"
    )

Majority rejection for over-sampling from `imblearn/utils/_validation.py:206-211`:

def _sampling_strategy_majority(y, sampling_type):
    if sampling_type == "over-sampling":
        raise ValueError(
            "'sampling_strategy'='majority' cannot be used with over-sampler."
        )

String strategy dispatch from `imblearn/utils/_validation.py:541-550`:

if isinstance(sampling_strategy, str):
    if sampling_strategy not in SAMPLING_TARGET_KIND.keys():
        raise ValueError(
            "When 'sampling_strategy' is a string, it needs"
            f" to be one of {SAMPLING_TARGET_KIND}. Got '{sampling_strategy}' "
            "instead."
        )
    return OrderedDict(
        sorted(SAMPLING_TARGET_KIND[sampling_strategy](y, sampling_type).items())
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment