Implementation:Scikit learn contrib Imbalanced learn SMOTE

Knowledge Sources	imbalanced-learn imbalanced-learn Docs
Domains	Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

Concrete tool for generating synthetic minority class samples via nearest-neighbor interpolation provided by the imbalanced-learn library.

Description

The SMOTE class implements the Synthetic Minority Over-sampling Technique. It extends BaseSMOTE and generates new minority samples by interpolating between each minority instance and its k-nearest neighbors. The class integrates with scikit-learn's estimator API, supporting pipeline composition, parameter validation, and metadata routing.

Usage

Import this class when you need to balance a dataset with continuous numeric features before training a classifier. Use it as a standalone resampler via fit_resample() or as a step in an imblearn.pipeline.Pipeline.

Code Reference

Source Location

Repository: imbalanced-learn
File: imblearn/over_sampling/_smote/base.py
Lines: L242-380

Signature

class SMOTE(BaseSMOTE):
    def __init__(
        self,
        *,
        sampling_strategy="auto",
        random_state=None,
        k_neighbors=5,
    ):
        """
        Args:
            sampling_strategy: str, dict, or callable - Desired ratio of
                minority to majority samples. 'auto' equalizes all classes.
            random_state: int, RandomState, or None - Seed for reproducibility.
            k_neighbors: int or NearestNeighbors instance - Number of nearest
                neighbors used to generate synthetic samples (default: 5).
        """

Import

from imblearn.over_sampling import SMOTE

I/O Contract

Inputs

Name	Type	Required	Description
X	{array-like, sparse matrix, dataframe} of shape (n_samples, n_features)	Yes	Feature matrix of training data
y	array-like of shape (n_samples,)	Yes	Target labels indicating class membership
sampling_strategy	str, dict, or callable	No	Resampling ratio; 'auto' equalizes all classes
k_neighbors	int or NearestNeighbors	No	Nearest neighbors for interpolation (default: 5)
random_state	int, RandomState, or None	No	Random seed for reproducibility

Outputs

Name	Type	Description
X_resampled	{ndarray, sparse matrix, dataframe} of shape (n_samples_new, n_features)	Feature matrix with synthetic minority samples added
y_resampled	ndarray of shape (n_samples_new,)	Target array with corresponding labels for synthetic samples

Usage Examples

Basic Oversampling

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE

# 1. Create an imbalanced dataset
X, y = make_classification(
    n_classes=2, class_sep=2, weights=[0.1, 0.9],
    n_informative=3, n_redundant=1, flip_y=0,
    n_features=20, n_clusters_per_class=1,
    n_samples=1000, random_state=10,
)
print(f"Original: {Counter(y)}")

# 2. Apply SMOTE
smote = SMOTE(random_state=42)
X_resampled, y_resampled = smote.fit_resample(X, y)
print(f"Resampled: {Counter(y_resampled)}")

Inside a Pipeline

from imblearn.pipeline import make_pipeline
from imblearn.over_sampling import SMOTE
from sklearn.svm import LinearSVC
from sklearn.model_selection import cross_validate

# Build pipeline with SMOTE + classifier
pipeline = make_pipeline(SMOTE(random_state=42), LinearSVC())

# Cross-validate (SMOTE applied only to training folds)
scores = cross_validate(pipeline, X, y, scoring="balanced_accuracy", cv=5)
print(f"Mean balanced accuracy: {scores['test_score'].mean():.3f}")

Custom Sampling Strategy

from imblearn.over_sampling import SMOTE

# Specify exact number of samples per class
smote = SMOTE(
    sampling_strategy={0: 500},  # Generate 500 total minority samples
    k_neighbors=3,
    random_state=42,
)
X_res, y_res = smote.fit_resample(X, y)

Related Pages

Implements Principle

Principle:Scikit_learn_contrib_Imbalanced_learn_Synthetic_Minority_Oversampling

Requires Environment

Environment:Scikit_learn_contrib_Imbalanced_learn_Python_Scikit_learn

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment