Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Scikit learn contrib Imbalanced learn Cluster Based Oversampling

From Leeroopedia


Knowledge Sources
Domains Machine_Learning, Data_Preprocessing, Imbalanced_Learning
Last Updated 2026-02-09 03:00 GMT

Overview

An oversampling strategy that clusters the input space first, then applies SMOTE selectively within clusters that contain sufficient minority samples and are not already balanced.

Description

Cluster-based oversampling combines clustering with synthetic minority oversampling. The algorithm first partitions the data using KMeans, then evaluates each cluster's class balance. SMOTE is applied only within clusters that meet a balance threshold, ensuring synthetic samples are generated in meaningful regions of the feature space rather than in sparse or already-balanced areas.

This prevents the generation of noisy synthetic samples in regions dominated by majority class instances and focuses oversampling in clusters where minority samples genuinely exist.

Usage

Use this principle when:

  • The minority class is distributed across distinct sub-clusters
  • Standard SMOTE generates noisy samples between distant minority regions
  • The data has a natural cluster structure
  • Reducing synthetic noise is more important than maximizing minority coverage

Theoretical Basis

  1. Cluster all data using KMeans
  2. Filter clusters: keep only those where minority density exceeds a threshold
  3. Distribute synthetic sample generation across filtered clusters proportional to their minority density
  4. Apply SMOTE within each selected cluster
# Abstract KMeans-SMOTE algorithm (NOT real implementation)
clusters = KMeans(n_clusters=k).fit_predict(X)
for cluster_id in unique(clusters):
    minority_ratio = count_minority(cluster_id) / count_all(cluster_id)
    if minority_ratio >= cluster_balance_threshold:
        n_synthetic = proportional_allocation(cluster_id)
        apply_smote_in_cluster(cluster_id, n_synthetic)

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment