Implementation:Scikit learn contrib Imbalanced learn BaseUnderSampler
Implementation: BaseUnderSampler
BaseUnderSampler and BaseCleaningSampler are abstract base classes in the imbalanced-learn library that provide the foundation for all under-sampling algorithms. They extend BaseSampler and define the sampling type, parameter constraints, and documentation templates for their respective categories of under-sampling.
Overview
| Property | Value |
|---|---|
| Classes | BaseUnderSampler(BaseSampler), BaseCleaningSampler(BaseSampler)
|
| Source | imblearn/under_sampling/base.py (lines 1-115)
|
| Import | from imblearn.under_sampling.base import BaseUnderSampler, BaseCleaningSampler
|
Purpose
Under-sampling algorithms in imbalanced-learn fall into two distinct categories, each with different semantics for the sampling_strategy parameter:
- Controlled under-samplers (BaseUnderSampler): Reduce the majority class to a specific target count. The user can specify an exact ratio or target size.
- Cleaning samplers (BaseCleaningSampler): Remove noisy or borderline samples without guaranteeing a specific target count. The cleaning criteria determine which samples are removed.
These base classes ensure that all under-sampling implementations share a consistent API and parameter validation scheme.
BaseUnderSampler
Key Attributes
| Attribute | Value | Description |
|---|---|---|
_sampling_type |
"under-sampling" |
Identifies the sampling category. |
_sampling_strategy_docstring |
(see below) | Documentation template for the sampling_strategy parameter.
|
_parameter_constraints |
dict | Validation rules for sampling_strategy.
|
sampling_strategy Parameter
BaseUnderSampler accepts the following types for sampling_strategy:
| Type | Description |
|---|---|
| float | Desired ratio of minority to majority samples after resampling: alpha_us = N_m / N_rM. Binary classification only.
|
| str | One of 'majority', 'not minority', 'not majority', 'all', 'auto' (equivalent to 'not minority').
|
| dict | Keys are targeted classes; values are the desired number of samples for each class. |
| callable | A function taking y and returning a dict mapping classes to desired sample counts.
|
Parameter Constraints
_parameter_constraints: dict = {
"sampling_strategy": [
Interval(numbers.Real, 0, 1, closed="right"),
StrOptions({"auto", "majority", "not minority", "not majority", "all"}),
Mapping,
callable,
],
}
BaseCleaningSampler
Key Attributes
| Attribute | Value | Description |
|---|---|---|
_sampling_type |
"clean-sampling" |
Identifies the sampling category. |
_sampling_strategy_docstring |
(see below) | Documentation template for the sampling_strategy parameter.
|
_parameter_constraints |
dict | Validation rules for sampling_strategy.
|
sampling_strategy Parameter
BaseCleaningSampler accepts the following types for sampling_strategy:
| Type | Description |
|---|---|
| str | One of 'majority', 'not minority', 'not majority', 'all', 'auto' (equivalent to 'not minority'). The number of samples will not be equalized.
|
| list | A list of classes targeted by the resampling. |
| callable | A function taking y and returning a dict mapping classes to desired sample counts.
|
Parameter Constraints
_parameter_constraints: dict = {
"sampling_strategy": [
Interval(numbers.Real, 0, 1, closed="right"),
StrOptions({"auto", "majority", "not minority", "not majority", "all"}),
list,
callable,
],
}
Inheritance Hierarchy
BaseSampler # imblearn.base
|
+-- BaseUnderSampler # imblearn.under_sampling.base
| |
| +-- RandomUnderSampler
| +-- NearMiss
| +-- InstanceHardnessThreshold
| +-- ...
|
+-- BaseCleaningSampler # imblearn.under_sampling.base
|
+-- TomekLinks
+-- EditedNearestNeighbours
+-- NeighbourhoodCleaningRule
+-- ...
Usage Note
These classes should not be instantiated directly. They are intended to be subclassed by concrete under-sampling algorithms. Subclasses must implement the _fit_resample method inherited from BaseSampler.