Heuristic:Scikit learn contrib Imbalanced learn Sampling Strategy Selection
| Knowledge Sources | |
|---|---|
| Domains | Imbalanced_Classification, Parameter_Selection |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
The `sampling_strategy` parameter accepts strings, dicts, floats, lists, or callables, each with type-specific validation rules that depend on the sampling method.
Description
The `sampling_strategy` parameter is the primary control for how resampling targets are computed in imbalanced-learn. Its behavior varies significantly based on its type (string, dict, float, list, callable) and the sampling context (over-sampling, under-sampling, or clean-sampling). Mismatched types cause `ValueError` exceptions with specific error messages. Understanding the type constraints and their interactions with sampling methods prevents common configuration errors.
Usage
Apply this heuristic when configuring any sampler's `sampling_strategy` parameter. Especially important when switching between over-sampling and under-sampling methods, as valid string values differ between them.
The Insight (Rule of Thumb)
- Action: Match `sampling_strategy` type to sampling method:
- Strings: `"auto"` maps to `"not majority"` for over-sampling, `"not minority"` for under-sampling
- Float: Only valid for binary classification; must be in range (0, 1]
- Dict: Keys must be existing class labels; values must be >= current count (over-sampling) or <= current count (under-sampling)
- List: Only valid for clean-sampling methods (e.g., `TomekLinks`, `EditedNearestNeighbours`)
- Callable: Must accept `y` and return a dict
- Value: Default `"auto"` is correct for most use cases
- Trade-off: More specific strategies (dict, float) give finer control but require knowledge of dataset class distribution
Reasoning
The validation logic in `imblearn/utils/_validation.py` enforces strict type-method compatibility. Key constraints:
- Float strategy enforces binary classification because the ratio is defined between exactly two classes
- Over-sampling dict values must be >= current counts (you cannot over-sample to fewer samples)
- Under-sampling dict values must be <= current counts (you cannot under-sample to more samples)
- String `"majority"` is invalid for over-sampling (cannot over-sample the majority class)
- String `"minority"` is invalid for under-sampling (cannot under-sample the minority class)
Code Evidence
Float validation from `imblearn/utils/_validation.py:559-564`:
elif isinstance(sampling_strategy, Real):
if sampling_strategy <= 0 or sampling_strategy > 1:
raise ValueError(
"When 'sampling_strategy' is a float, it should be "
f"in the range (0, 1]. Got {sampling_strategy} instead."
)
Single-class rejection from `imblearn/utils/_validation.py:532-536`:
if np.unique(y).size <= 1:
raise ValueError(
"The target 'y' needs to have more than 1 class. "
f"Got {np.unique(y).size} class instead"
)
Majority rejection for over-sampling from `imblearn/utils/_validation.py:206-211`:
def _sampling_strategy_majority(y, sampling_type):
if sampling_type == "over-sampling":
raise ValueError(
"'sampling_strategy'='majority' cannot be used with over-sampler."
)
String strategy dispatch from `imblearn/utils/_validation.py:541-550`:
if isinstance(sampling_strategy, str):
if sampling_strategy not in SAMPLING_TARGET_KIND.keys():
raise ValueError(
"When 'sampling_strategy' is a string, it needs"
f" to be one of {SAMPLING_TARGET_KIND}. Got '{sampling_strategy}' "
"instead."
)
return OrderedDict(
sorted(SAMPLING_TARGET_KIND[sampling_strategy](y, sampling_type).items())
)
Related Pages
- Implementation:Scikit_learn_contrib_Imbalanced_learn_SMOTE
- Implementation:Scikit_learn_contrib_Imbalanced_learn_ADASYN
- Implementation:Scikit_learn_contrib_Imbalanced_learn_BorderlineSMOTE
- Implementation:Scikit_learn_contrib_Imbalanced_learn_SVMSMOTE
- Implementation:Scikit_learn_contrib_Imbalanced_learn_KMeansSMOTE
- Implementation:Scikit_learn_contrib_Imbalanced_learn_SMOTEENN
- Implementation:Scikit_learn_contrib_Imbalanced_learn_SMOTETomek
- Implementation:Scikit_learn_contrib_Imbalanced_learn_make_imbalance
- Implementation:Scikit_learn_contrib_Imbalanced_learn_BalancedRandomForestClassifier
- Implementation:Scikit_learn_contrib_Imbalanced_learn_BalancedBaggingClassifier
- Implementation:Scikit_learn_contrib_Imbalanced_learn_EasyEnsembleClassifier
- Implementation:Scikit_learn_contrib_Imbalanced_learn_RUSBoostClassifier