Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Scikit learn contrib Imbalanced learn SMOTETomek

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

Concrete tool for combined SMOTE oversampling and Tomek Links cleaning provided by the imbalanced-learn library.

Description

The SMOTETomek class combines SMOTE oversampling with Tomek Links under-sampling. It first applies SMOTE to generate synthetic minority samples, then identifies and removes Tomek link pairs (nearest-neighbor pairs from different classes), cleaning the boundary. This is a gentler cleaning approach than SMOTEENN.

Usage

Import this class when you want combined oversampling and cleaning with minimal data removal. SMOTETomek removes only direct boundary ambiguities (Tomek links) rather than all misclassified neighbors.

Code Reference

Source Location

  • Repository: imbalanced-learn
  • File: imblearn/combine/_smote_tomek.py
  • Lines: L26-157

Signature

class SMOTETomek(BaseSampler):
    def __init__(
        self,
        *,
        sampling_strategy="auto",
        random_state=None,
        smote=None,
        tomek=None,
        n_jobs=None,
    ):
        """
        Args:
            sampling_strategy: str, dict, or callable - Resampling ratio.
            random_state: int, RandomState, or None - Seed.
            smote: SMOTE or None - SMOTE instance (default: SMOTE()).
            tomek: TomekLinks or None - TomekLinks instance (default: TomekLinks()).
            n_jobs: int or None - Parallel jobs.
        """

Import

from imblearn.combine import SMOTETomek

I/O Contract

Inputs

Name Type Required Description
X {array-like, sparse matrix} of shape (n_samples, n_features) Yes Feature matrix
y array-like of shape (n_samples,) Yes Target labels
smote SMOTE or None No Custom SMOTE instance
tomek TomekLinks or None No Custom TomekLinks instance

Outputs

Name Type Description
X_resampled ndarray of shape (n_samples_new, n_features) Oversampled and Tomek-cleaned matrix
y_resampled ndarray of shape (n_samples_new,) Cleaned target array

Usage Examples

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.combine import SMOTETomek

X, y = make_classification(
    n_classes=2, weights=[0.1, 0.9], n_samples=1000, random_state=10
)
smote_tomek = SMOTETomek(random_state=42)
X_res, y_res = smote_tomek.fit_resample(X, y)
print(f"Resampled: {Counter(y_res)}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment