Principle:Scikit learn contrib Imbalanced learn Cluster Based Oversampling

Knowledge Sources	KMeans-SMOTE: Imbalanced Learning with KMeans and SMOTE
Domains	Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated	2026-02-09 03:00 GMT

Overview

An oversampling strategy that clusters the input space first, then applies SMOTE selectively within clusters that contain sufficient minority samples and are not already balanced.

Description

Cluster-based oversampling combines clustering with synthetic minority oversampling. The algorithm first partitions the data using KMeans, then evaluates each cluster's class balance. SMOTE is applied only within clusters that meet a balance threshold, ensuring synthetic samples are generated in meaningful regions of the feature space rather than in sparse or already-balanced areas.

This prevents the generation of noisy synthetic samples in regions dominated by majority class instances and focuses oversampling in clusters where minority samples genuinely exist.

Usage

Use this principle when:

The minority class is distributed across distinct sub-clusters
Standard SMOTE generates noisy samples between distant minority regions
The data has a natural cluster structure
Reducing synthetic noise is more important than maximizing minority coverage

Theoretical Basis

Cluster all data using KMeans
Filter clusters: keep only those where minority density exceeds a threshold
Distribute synthetic sample generation across filtered clusters proportional to their minority density
Apply SMOTE within each selected cluster

# Abstract KMeans-SMOTE algorithm (NOT real implementation)
clusters = KMeans(n_clusters=k).fit_predict(X)
for cluster_id in unique(clusters):
    minority_ratio = count_minority(cluster_id) / count_all(cluster_id)
    if minority_ratio >= cluster_balance_threshold:
        n_synthetic = proportional_allocation(cluster_id)
        apply_smote_in_cluster(cluster_id, n_synthetic)

Related Pages

Implemented By

Implementation:Scikit_learn_contrib_Imbalanced_learn_KMeansSMOTE

Uses Heuristic

Heuristic:Scikit_learn_contrib_Imbalanced_learn_KNeighbors_Selection_Tips

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment