Principle:Scikit learn contrib Imbalanced learn Borderline Oversampling
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Preprocessing, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
A focused oversampling technique that generates synthetic samples only from minority instances near the decision boundary (borderline samples), where classification is most uncertain.
Description
Borderline SMOTE refines standard SMOTE by first identifying minority samples that lie on the borderline between classes. A minority sample is classified as borderline (or "in danger") if among its m nearest neighbors, roughly half belong to the majority class. Only these borderline samples are used for synthetic generation.
Two variants exist:
- Borderline-1: Generates synthetic samples only between the borderline minority sample and its minority nearest neighbors.
- Borderline-2: Additionally allows interpolation toward majority class nearest neighbors, pushing the boundary further into majority territory.
This targeted approach is more effective than uniform SMOTE because the decision boundary region is where the classifier struggles most.
Usage
Use this principle when:
- The minority class has a clear borderline region with the majority class
- Standard SMOTE generates too many samples in safe, interior regions
- The goal is to strengthen the decision boundary specifically
- Borderline-1 is preferred for conservative expansion; Borderline-2 for aggressive expansion
Theoretical Basis
The algorithm operates in two phases:
Phase 1 - Borderline Detection: For each minority sample x_i, count its m nearest neighbors from all classes. If the number of majority neighbors is between m/2 and m (exclusive), mark x_i as a borderline ("danger") sample.
Phase 2 - Synthetic Generation: Apply SMOTE interpolation only to the set of borderline minority samples.
# Abstract Borderline-SMOTE algorithm (NOT real implementation)
DANGER = set()
for each minority_sample x_i:
m_neighbors = m_nearest_neighbors(x_i, m, all_classes=True)
majority_count = count_majority(m_neighbors)
if m/2 <= majority_count < m:
DANGER.add(x_i)
# Only oversample borderline samples
for x_i in DANGER:
apply_smote_interpolation(x_i, k_neighbors)