Implementation:Scikit learn contrib Imbalanced learn KMeansSMOTE
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Preprocessing, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
Concrete tool for cluster-aware synthetic oversampling provided by the imbalanced-learn library.
Description
The KMeansSMOTE class applies KMeans clustering before oversampling with SMOTE. It extends BaseSMOTE and uses a MiniBatchKMeans estimator by default for scalability. The cluster_balance_threshold parameter controls which clusters receive synthetic samples, and density_exponent weights the allocation across clusters.
Usage
Import this class when the data has natural cluster structure and you want to avoid generating synthetic samples in sparse or majority-dominated regions.
Code Reference
Source Location
- Repository: imbalanced-learn
- File: imblearn/over_sampling/_smote/cluster.py
- Lines: L30-308
Signature
class KMeansSMOTE(BaseSMOTE):
def __init__(
self,
*,
sampling_strategy="auto",
random_state=None,
k_neighbors=2,
n_jobs=None,
kmeans_estimator=None,
cluster_balance_threshold="auto",
density_exponent="auto",
):
"""
Args:
sampling_strategy: str, dict, or callable - Resampling ratio.
random_state: int, RandomState, or None - Seed.
k_neighbors: int or NearestNeighbors - SMOTE neighbors (default: 2).
n_jobs: int or None - Parallel jobs.
kmeans_estimator: int or KMeans - Number of clusters or estimator
(default: MiniBatchKMeans).
cluster_balance_threshold: 'auto' or float - Min minority ratio
per cluster.
density_exponent: 'auto' or float - Exponent for density weighting.
"""
Import
from imblearn.over_sampling import KMeansSMOTE
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| X | {array-like, sparse matrix} of shape (n_samples, n_features) | Yes | Feature matrix |
| y | array-like of shape (n_samples,) | Yes | Target labels |
| kmeans_estimator | int or KMeans | No | Clustering estimator or number of clusters |
| cluster_balance_threshold | 'auto' or float | No | Minimum minority ratio to oversample a cluster |
Outputs
| Name | Type | Description |
|---|---|---|
| X_resampled | ndarray of shape (n_samples_new, n_features) | Feature matrix with cluster-guided synthetic samples |
| y_resampled | ndarray of shape (n_samples_new,) | Target array |
Usage Examples
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.over_sampling import KMeansSMOTE
X, y = make_classification(
n_classes=2, weights=[0.1, 0.9], n_samples=1000,
n_clusters_per_class=3, random_state=10
)
kmeans_smote = KMeansSMOTE(random_state=42, kmeans_estimator=10)
X_res, y_res = kmeans_smote.fit_resample(X, y)
print(f"Resampled: {Counter(y_res)}")
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment