Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Scikit learn contrib Imbalanced learn Combined Over Under Sampling Tomek

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

A conservative two-stage resampling strategy that oversamples with SMOTE then removes Tomek link pairs to clean only the most ambiguous boundary samples.

Description

SMOTE + Tomek Links first applies SMOTE to balance the dataset, then identifies Tomek links: pairs of samples from different classes that are each other's nearest neighbors. Removing these pairs eliminates the most ambiguous boundary cases. Compared to SMOTE+ENN, this approach is more conservative, removing fewer samples and preserving more of the oversampled data.

Usage

Use this principle when a gentle cleaning step after SMOTE is preferred. Tomek Links removal only targets direct nearest-neighbor conflicts, making it suitable when aggressive data removal is undesirable.

Theoretical Basis

A Tomek link exists between samples a and b if:

  • They belong to different classes
  • There is no sample c such that d(a,c) < d(a,b) or d(b,c) < d(a,b)

In other words, a and b are each other's nearest neighbor despite being from different classes.

# Abstract SMOTE+Tomek algorithm (NOT real implementation)
X_over, y_over = SMOTE().fit_resample(X, y)
for each pair (a, b) where class(a) != class(b):
    if nearest_neighbor(a) == b and nearest_neighbor(b) == a:
        remove(a, b)  # Tomek link pair

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment