Implementation:Scikit learn contrib Imbalanced learn SMOTETomek
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Preprocessing, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
Concrete tool for combined SMOTE oversampling and Tomek Links cleaning provided by the imbalanced-learn library.
Description
The SMOTETomek class combines SMOTE oversampling with Tomek Links under-sampling. It first applies SMOTE to generate synthetic minority samples, then identifies and removes Tomek link pairs (nearest-neighbor pairs from different classes), cleaning the boundary. This is a gentler cleaning approach than SMOTEENN.
Usage
Import this class when you want combined oversampling and cleaning with minimal data removal. SMOTETomek removes only direct boundary ambiguities (Tomek links) rather than all misclassified neighbors.
Code Reference
Source Location
- Repository: imbalanced-learn
- File: imblearn/combine/_smote_tomek.py
- Lines: L26-157
Signature
class SMOTETomek(BaseSampler):
def __init__(
self,
*,
sampling_strategy="auto",
random_state=None,
smote=None,
tomek=None,
n_jobs=None,
):
"""
Args:
sampling_strategy: str, dict, or callable - Resampling ratio.
random_state: int, RandomState, or None - Seed.
smote: SMOTE or None - SMOTE instance (default: SMOTE()).
tomek: TomekLinks or None - TomekLinks instance (default: TomekLinks()).
n_jobs: int or None - Parallel jobs.
"""
Import
from imblearn.combine import SMOTETomek
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | {array-like, sparse matrix} of shape (n_samples, n_features) | Yes | Feature matrix |
| y | array-like of shape (n_samples,) | Yes | Target labels |
| smote | SMOTE or None | No | Custom SMOTE instance |
| tomek | TomekLinks or None | No | Custom TomekLinks instance |
Outputs
| Name | Type | Description |
|---|---|---|
| X_resampled | ndarray of shape (n_samples_new, n_features) | Oversampled and Tomek-cleaned matrix |
| y_resampled | ndarray of shape (n_samples_new,) | Cleaned target array |
Usage Examples
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.combine import SMOTETomek
X, y = make_classification(
n_classes=2, weights=[0.1, 0.9], n_samples=1000, random_state=10
)
smote_tomek = SMOTETomek(random_state=42)
X_res, y_res = smote_tomek.fit_resample(X, y)
print(f"Resampled: {Counter(y_res)}")