Principle:Scikit learn contrib Imbalanced learn Combined Over Under Sampling

Knowledge Sources	SMOTE: Synthetic Minority Over-sampling Technique Balancing Training Data for Automated Annotation of Keywords
Domains	Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

A two-stage resampling strategy that first oversamples the minority class with SMOTE, then cleans the resulting data by removing noisy or ambiguous samples using an under-sampling technique.

Description

Combined over-and-under-sampling addresses the noise introduced by SMOTE by applying a cleaning step after oversampling. SMOTE can generate synthetic samples that overlap with majority class instances, creating ambiguous regions. By following SMOTE with a cleaning method such as Edited Nearest Neighbours (ENN) or Tomek Links, these noisy samples are removed.

Two primary variants exist:

SMOTE + ENN: Removes any sample (from either class) whose class label differs from the majority of its nearest neighbors. This is a more aggressive cleaning approach.
SMOTE + Tomek Links: Removes only Tomek link pairs (nearest-neighbor pairs from different classes), which is a gentler cleaning approach that removes borderline ambiguity.

Usage

Use this principle when:

SMOTE alone introduces too much noise near the decision boundary
Cleaner class boundaries are needed after oversampling
A balance between oversampling and noise reduction is desired
SMOTE+ENN for aggressive cleaning; SMOTE+Tomek for conservative cleaning

Theoretical Basis

Stage 1: Apply SMOTE to oversample the minority class.

Stage 2: Apply a cleaning rule:

ENN cleaning: For each sample, find its k nearest neighbors. If the sample's class differs from the majority class of its neighbors, remove it.
Tomek Links: For each pair of nearest neighbors from different classes, remove one or both to clean the boundary.

# Abstract combined resampling (NOT real implementation)
# Stage 1: Oversample
X_over, y_over = SMOTE().fit_resample(X, y)

# Stage 2: Clean
# ENN variant - remove misclassified by neighbors
for sample in X_over:
    neighbors = k_nearest_neighbors(sample, k=3)
    if majority_class(neighbors) != class(sample):
        remove(sample)

Related Pages

Implemented By

Implementation:Scikit_learn_contrib_Imbalanced_learn_SMOTEENN

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment