Principle:Scikit learn contrib Imbalanced learn Adaptive Synthetic Sampling
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Preprocessing, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
An adaptive oversampling technique that generates more synthetic samples for minority instances that are harder to learn, based on the density of majority neighbors.
Description
Adaptive Synthetic Sampling (ADASYN) extends the SMOTE approach by adaptively adjusting the number of synthetic samples generated for each minority instance based on its local difficulty. Minority samples surrounded by more majority class neighbors (i.e., harder to learn) receive more synthetic samples, while those in safer regions receive fewer.
This adaptive weighting shifts the classification boundary toward the difficult examples, focusing the learning effort where it matters most. ADASYN was proposed by He et al. (2008) and addresses a key limitation of standard SMOTE, which treats all minority samples equally regardless of their learning difficulty.
Usage
Use this principle when:
- Standard SMOTE produces too many synthetic samples in easy-to-classify regions
- The focus should be on minority samples near the decision boundary
- The dataset has varying difficulty across minority class regions
- A density-adaptive approach is preferred over uniform oversampling
Theoretical Basis
ADASYN computes a density ratio for each minority sample:
- For each minority sample x_i, compute where is the number of majority-class neighbors among its k nearest neighbors
- Normalize:
- Generate synthetic samples for x_i, where G is the total number of synthetic samples needed
Pseudo-code:
# Abstract ADASYN algorithm (NOT real implementation)
G = total_synthetic_samples_needed
for each minority_sample x_i:
majority_neighbors = count_majority_in_k_neighbors(x_i, k)
r_i = majority_neighbors / k
r_normalized = normalize(r_values)
for each minority_sample x_i:
g_i = round(r_normalized[i] * G)
generate g_i synthetic samples near x_i using SMOTE interpolation