Principle:Scikit learn Scikit learn Support Vector Machines

Knowledge Sources	Scikit_learn Scikit-learn Docs
Domains	Supervised Learning, Classification
Last Updated	2026-02-08 15:00 GMT

Overview

Support vector machines find the optimal hyperplane that maximizes the margin between classes, optionally mapping data into higher-dimensional spaces via kernel functions.

Description

Support Vector Machines (SVMs) are powerful supervised learning models for classification and regression that seek the decision boundary with the largest possible margin of separation between classes. The key insight is that only a subset of training points (the support vectors) determine the decision boundary, making SVMs memory-efficient. The kernel trick allows SVMs to learn non-linear decision boundaries by implicitly operating in high-dimensional feature spaces without explicit transformation. SVMs occupy a central position in machine learning, known for strong theoretical guarantees based on statistical learning theory and structural risk minimization.

Usage

Use SVC (Support Vector Classification) for binary and multiclass classification tasks where high accuracy is needed and the dataset is small to moderate in size. Use SVR (Support Vector Regression) for regression tasks where an epsilon-insensitive tube around the prediction is appropriate. Use NuSVC/NuSVR when you prefer to parameterize the model using the fraction of support vectors rather than the penalty parameter C. Choose the RBF kernel as a default for non-linear problems, the linear kernel when the data is high-dimensional relative to the number of samples, and the polynomial kernel when interaction effects between features are expected. SVMs do not scale well to very large datasets; for large-scale linear problems, consider SGD-based approaches instead.

Theoretical Basis

Linear SVM (hard margin) solves:

$\min_{w, b} \frac{1}{2} ‖ w ‖^{2} s.t. y_{i} (w^{T} x_{i} + b) \geq 1, \forall i$

The margin is $2 / ‖ w ‖$ , and maximizing the margin is equivalent to minimizing $‖ w ‖^{2}$ .

Soft-margin SVM introduces slack variables $ξ_{i} \geq 0$ to allow misclassifications:

$\min_{w, b, ξ} \frac{1}{2} ‖ w ‖^{2} + C \sum_{i = 1}^{n} ξ_{i}$ $s.t. y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0$

The parameter $C$ controls the trade-off between margin width and training error.

Dual formulation:

$\max_{α} \sum_{i = 1}^{n} α_{i} - \frac{1}{2} \sum_{i, j} α_{i} α_{j} y_{i} y_{j} x_{i}^{T} x_{j}$ $s.t. 0 \leq α_{i} \leq C, \sum_{i} α_{i} y_{i} = 0$

Kernel trick: Replace the inner product $x_{i}^{T} x_{j}$ with a kernel function $k (x_{i}, x_{j}) = ϕ (x_{i})^{T} ϕ (x_{j})$ :

RBF kernel: $k (x, y) = \exp (- γ ‖ x - y ‖^{2})$
Polynomial kernel: $k (x, y) = (γ x^{T} y + r)^{d}$
Sigmoid kernel: $k (x, y) = \tanh (γ x^{T} y + r)$

Support Vector Regression (SVR) uses an $ε$ -insensitive loss:

$\min_{w, b} \frac{1}{2} ‖ w ‖^{2} + C \sum_{i = 1}^{n} (ξ_{i} + ξ_{i}^{*})$ $s.t. | y_{i} - (w^{T} x_{i} + b) | \leq ε + ξ_{i} + ξ_{i}^{*}$

Errors within the $ε$ -tube incur no penalty; only errors exceeding it contribute to the loss.

Nu-SVM replaces $C$ with a parameter $ν \in (0, 1]$ that provides an upper bound on the fraction of training errors and a lower bound on the fraction of support vectors, offering a more interpretable parameterization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment