Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Scikit learn contrib Imbalanced learn Synthetic Minority Oversampling

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

A synthetic data generation technique that creates new minority class samples by interpolating between existing minority class instances and their nearest neighbors.

Description

Synthetic Minority Oversampling Technique (SMOTE) addresses class imbalance by generating synthetic samples rather than duplicating existing minority instances. The core idea is to select a minority sample, find its k-nearest minority neighbors, and create new samples along the line segments connecting them. This approach avoids the overfitting problem associated with simple random oversampling, which merely replicates existing data points.

SMOTE was introduced by Chawla et al. (2002) and has since become one of the most widely adopted techniques for handling imbalanced datasets. The algorithm operates in feature space rather than data space, producing plausible new instances that expand the decision region of the minority class.

The technique supports multi-class problems through a one-vs-rest decomposition, where each minority class is oversampled independently against the remaining classes.

Usage

Use this principle when working with classification tasks where the minority class has significantly fewer samples than the majority class. SMOTE is appropriate when:

  • The minority class decision boundary needs to be expanded rather than simply reinforced
  • Random oversampling leads to overfitting on minority class instances
  • The feature space is continuous and numeric (for standard SMOTE; variants exist for categorical and mixed features)
  • The dataset is not extremely small (SMOTE needs enough minority samples to find meaningful neighbors)

Theoretical Basis

The SMOTE algorithm generates synthetic samples through linear interpolation in feature space:

For each minority class sample x_i:

  1. Find its k nearest minority class neighbors
  2. Randomly select one neighbor x_nn
  3. Generate a synthetic sample: xnew=xi+λ(xnnxi) where λU(0,1)

Pseudo-code:

# Abstract SMOTE algorithm (NOT real implementation)
for each minority_sample x_i:
    neighbors = k_nearest_minority_neighbors(x_i, k=5)
    for j in range(num_synthetic_needed):
        x_nn = random_choice(neighbors)
        lam = random_uniform(0, 1)
        x_new = x_i + lam * (x_nn - x_i)
        add_to_dataset(x_new)

The number of synthetic samples generated per minority instance is determined by the sampling_strategy parameter, which defines the desired class distribution after resampling.

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment