Principle:Scikit learn contrib Imbalanced learn SVM Borderline Oversampling
| Knowledge Sources | |
|---|---|
| Domains | Machine_Learning, Data_Preprocessing, Imbalanced_Learning |
| Last Updated | 2026-02-09 03:00 GMT |
Overview
An oversampling variant that uses a Support Vector Machine to identify the borderline region and generates synthetic samples near the SVM decision boundary with controlled perturbation.
Description
SVM-SMOTE uses a trained SVM classifier to identify the support vectors of the minority class, which naturally lie near the decision boundary. Synthetic samples are then generated near these support vectors. An additional out_step parameter controls how far synthetic samples can extend beyond the support vector toward the majority class side, providing fine-grained control over boundary expansion.
This approach leverages the SVM's inherent ability to find the optimal separating hyperplane, making the borderline detection theoretically grounded rather than heuristic.
Usage
Use this principle when:
- An SVM-based boundary identification is preferred over k-NN heuristics
- The decision boundary is complex and non-linear
- Fine control over synthetic sample placement relative to the boundary is needed
Theoretical Basis
- Train an SVM on the imbalanced data
- Extract minority class support vectors (the borderline instances)
- For each support vector, apply SMOTE interpolation with nearby minority samples
- Use out_step to extend synthetic samples slightly beyond the support vector
# Abstract SVM-SMOTE algorithm (NOT real implementation)
svm = SVC().fit(X, y)
support_vectors_minority = svm.support_vectors_[minority_class]
for sv in support_vectors_minority:
neighbors = k_nearest_minority_neighbors(sv, k)
x_nn = random_choice(neighbors)
lam = random_uniform(0, 1)
x_new = sv + lam * (x_nn - sv)
# Optionally extend beyond with out_step