Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Scikit learn Scikit learn Support Vector Machines

From Leeroopedia


Knowledge Sources
Domains Supervised Learning, Classification
Last Updated 2026-02-08 15:00 GMT

Overview

Support vector machines find the optimal hyperplane that maximizes the margin between classes, optionally mapping data into higher-dimensional spaces via kernel functions.

Description

Support Vector Machines (SVMs) are powerful supervised learning models for classification and regression that seek the decision boundary with the largest possible margin of separation between classes. The key insight is that only a subset of training points (the support vectors) determine the decision boundary, making SVMs memory-efficient. The kernel trick allows SVMs to learn non-linear decision boundaries by implicitly operating in high-dimensional feature spaces without explicit transformation. SVMs occupy a central position in machine learning, known for strong theoretical guarantees based on statistical learning theory and structural risk minimization.

Usage

Use SVC (Support Vector Classification) for binary and multiclass classification tasks where high accuracy is needed and the dataset is small to moderate in size. Use SVR (Support Vector Regression) for regression tasks where an epsilon-insensitive tube around the prediction is appropriate. Use NuSVC/NuSVR when you prefer to parameterize the model using the fraction of support vectors rather than the penalty parameter C. Choose the RBF kernel as a default for non-linear problems, the linear kernel when the data is high-dimensional relative to the number of samples, and the polynomial kernel when interaction effects between features are expected. SVMs do not scale well to very large datasets; for large-scale linear problems, consider SGD-based approaches instead.

Theoretical Basis

Linear SVM (hard margin) solves:

minw,b12w2s.t.yi(wTxi+b)1,i

The margin is 2/w, and maximizing the margin is equivalent to minimizing w2.

Soft-margin SVM introduces slack variables ξi0 to allow misclassifications:

minw,b,ξ12w2+Ci=1nξi s.t.yi(wTxi+b)1ξi,ξi0

The parameter C controls the trade-off between margin width and training error.

Dual formulation:

maxαi=1nαi12i,jαiαjyiyjxiTxj s.t.0αiC,iαiyi=0

Kernel trick: Replace the inner product xiTxj with a kernel function k(xi,xj)=ϕ(xi)Tϕ(xj):

  • RBF kernel: k(x,y)=exp(γxy2)
  • Polynomial kernel: k(x,y)=(γxTy+r)d
  • Sigmoid kernel: k(x,y)=tanh(γxTy+r)

Support Vector Regression (SVR) uses an ε-insensitive loss:

minw,b12w2+Ci=1n(ξi+ξi*) s.t.|yi(wTxi+b)|ε+ξi+ξi*

Errors within the ε-tube incur no penalty; only errors exceeding it contribute to the loss.

Nu-SVM replaces C with a parameter ν(0,1] that provides an upper bound on the fraction of training errors and a lower bound on the fraction of support vectors, offering a more interpretable parameterization.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment