Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Scikit learn contrib Imbalanced learn Combined Over Under Sampling

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

A two-stage resampling strategy that first oversamples the minority class with SMOTE, then cleans the resulting data by removing noisy or ambiguous samples using an under-sampling technique.

Description

Combined over-and-under-sampling addresses the noise introduced by SMOTE by applying a cleaning step after oversampling. SMOTE can generate synthetic samples that overlap with majority class instances, creating ambiguous regions. By following SMOTE with a cleaning method such as Edited Nearest Neighbours (ENN) or Tomek Links, these noisy samples are removed.

Two primary variants exist:

  • SMOTE + ENN: Removes any sample (from either class) whose class label differs from the majority of its nearest neighbors. This is a more aggressive cleaning approach.
  • SMOTE + Tomek Links: Removes only Tomek link pairs (nearest-neighbor pairs from different classes), which is a gentler cleaning approach that removes borderline ambiguity.

Usage

Use this principle when:

  • SMOTE alone introduces too much noise near the decision boundary
  • Cleaner class boundaries are needed after oversampling
  • A balance between oversampling and noise reduction is desired
  • SMOTE+ENN for aggressive cleaning; SMOTE+Tomek for conservative cleaning

Theoretical Basis

Stage 1: Apply SMOTE to oversample the minority class.

Stage 2: Apply a cleaning rule:

  • ENN cleaning: For each sample, find its k nearest neighbors. If the sample's class differs from the majority class of its neighbors, remove it.
  • Tomek Links: For each pair of nearest neighbors from different classes, remove one or both to clean the boundary.
# Abstract combined resampling (NOT real implementation)
# Stage 1: Oversample
X_over, y_over = SMOTE().fit_resample(X, y)

# Stage 2: Clean
# ENN variant - remove misclassified by neighbors
for sample in X_over:
    neighbors = k_nearest_neighbors(sample, k=3)
    if majority_class(neighbors) != class(sample):
        remove(sample)

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment