Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Scikit learn contrib Imbalanced learn KMeansSMOTE

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

Concrete tool for cluster-aware synthetic oversampling provided by the imbalanced-learn library.

Description

The KMeansSMOTE class applies KMeans clustering before oversampling with SMOTE. It extends BaseSMOTE and uses a MiniBatchKMeans estimator by default for scalability. The cluster_balance_threshold parameter controls which clusters receive synthetic samples, and density_exponent weights the allocation across clusters.

Usage

Import this class when the data has natural cluster structure and you want to avoid generating synthetic samples in sparse or majority-dominated regions.

Code Reference

Source Location

  • Repository: imbalanced-learn
  • File: imblearn/over_sampling/_smote/cluster.py
  • Lines: L30-308

Signature

class KMeansSMOTE(BaseSMOTE):
    def __init__(
        self,
        *,
        sampling_strategy="auto",
        random_state=None,
        k_neighbors=2,
        n_jobs=None,
        kmeans_estimator=None,
        cluster_balance_threshold="auto",
        density_exponent="auto",
    ):
        """
        Args:
            sampling_strategy: str, dict, or callable - Resampling ratio.
            random_state: int, RandomState, or None - Seed.
            k_neighbors: int or NearestNeighbors - SMOTE neighbors (default: 2).
            n_jobs: int or None - Parallel jobs.
            kmeans_estimator: int or KMeans - Number of clusters or estimator
                (default: MiniBatchKMeans).
            cluster_balance_threshold: 'auto' or float - Min minority ratio
                per cluster.
            density_exponent: 'auto' or float - Exponent for density weighting.
        """

Import

from imblearn.over_sampling import KMeansSMOTE

I/O Contract

Inputs

Name Type Required Description
X {array-like, sparse matrix} of shape (n_samples, n_features) Yes Feature matrix
y array-like of shape (n_samples,) Yes Target labels
kmeans_estimator int or KMeans No Clustering estimator or number of clusters
cluster_balance_threshold 'auto' or float No Minimum minority ratio to oversample a cluster

Outputs

Name Type Description
X_resampled ndarray of shape (n_samples_new, n_features) Feature matrix with cluster-guided synthetic samples
y_resampled ndarray of shape (n_samples_new,) Target array

Usage Examples

from collections import Counter
from sklearn.datasets import make_classification
from imblearn.over_sampling import KMeansSMOTE

X, y = make_classification(
    n_classes=2, weights=[0.1, 0.9], n_samples=1000,
    n_clusters_per_class=3, random_state=10
)
kmeans_smote = KMeansSMOTE(random_state=42, kmeans_estimator=10)
X_res, y_res = kmeans_smote.fit_resample(X, y)
print(f"Resampled: {Counter(y_res)}")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment