Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Scikit learn contrib Imbalanced learn BaseUnderSampler

From Leeroopedia


Implementation: BaseUnderSampler

BaseUnderSampler and BaseCleaningSampler are abstract base classes in the imbalanced-learn library that provide the foundation for all under-sampling algorithms. They extend BaseSampler and define the sampling type, parameter constraints, and documentation templates for their respective categories of under-sampling.

Overview

Property Value
Classes BaseUnderSampler(BaseSampler), BaseCleaningSampler(BaseSampler)
Source imblearn/under_sampling/base.py (lines 1-115)
Import from imblearn.under_sampling.base import BaseUnderSampler, BaseCleaningSampler

Purpose

Under-sampling algorithms in imbalanced-learn fall into two distinct categories, each with different semantics for the sampling_strategy parameter:

  1. Controlled under-samplers (BaseUnderSampler): Reduce the majority class to a specific target count. The user can specify an exact ratio or target size.
  2. Cleaning samplers (BaseCleaningSampler): Remove noisy or borderline samples without guaranteeing a specific target count. The cleaning criteria determine which samples are removed.

These base classes ensure that all under-sampling implementations share a consistent API and parameter validation scheme.

BaseUnderSampler

Key Attributes

Attribute Value Description
_sampling_type "under-sampling" Identifies the sampling category.
_sampling_strategy_docstring (see below) Documentation template for the sampling_strategy parameter.
_parameter_constraints dict Validation rules for sampling_strategy.

sampling_strategy Parameter

BaseUnderSampler accepts the following types for sampling_strategy:

Type Description
float Desired ratio of minority to majority samples after resampling: alpha_us = N_m / N_rM. Binary classification only.
str One of 'majority', 'not minority', 'not majority', 'all', 'auto' (equivalent to 'not minority').
dict Keys are targeted classes; values are the desired number of samples for each class.
callable A function taking y and returning a dict mapping classes to desired sample counts.

Parameter Constraints

_parameter_constraints: dict = {
    "sampling_strategy": [
        Interval(numbers.Real, 0, 1, closed="right"),
        StrOptions({"auto", "majority", "not minority", "not majority", "all"}),
        Mapping,
        callable,
    ],
}

BaseCleaningSampler

Key Attributes

Attribute Value Description
_sampling_type "clean-sampling" Identifies the sampling category.
_sampling_strategy_docstring (see below) Documentation template for the sampling_strategy parameter.
_parameter_constraints dict Validation rules for sampling_strategy.

sampling_strategy Parameter

BaseCleaningSampler accepts the following types for sampling_strategy:

Type Description
str One of 'majority', 'not minority', 'not majority', 'all', 'auto' (equivalent to 'not minority'). The number of samples will not be equalized.
list A list of classes targeted by the resampling.
callable A function taking y and returning a dict mapping classes to desired sample counts.

Parameter Constraints

_parameter_constraints: dict = {
    "sampling_strategy": [
        Interval(numbers.Real, 0, 1, closed="right"),
        StrOptions({"auto", "majority", "not minority", "not majority", "all"}),
        list,
        callable,
    ],
}

Inheritance Hierarchy

BaseSampler                         # imblearn.base
    |
    +-- BaseUnderSampler            # imblearn.under_sampling.base
    |       |
    |       +-- RandomUnderSampler
    |       +-- NearMiss
    |       +-- InstanceHardnessThreshold
    |       +-- ...
    |
    +-- BaseCleaningSampler         # imblearn.under_sampling.base
            |
            +-- TomekLinks
            +-- EditedNearestNeighbours
            +-- NeighbourhoodCleaningRule
            +-- ...

Usage Note

These classes should not be instantiated directly. They are intended to be subclassed by concrete under-sampling algorithms. Subclasses must implement the _fit_resample method inherited from BaseSampler.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment