Principle:Online ml River Online One Class SVM
| Knowledge Sources | River River Docs Estimating the Support of a High-Dimensional Distribution |
|---|---|
| Domains | Online Machine Learning, Anomaly Detection, Support Vector Machines |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Online variant of the One-Class Support Vector Machine for unsupervised anomaly detection, using stochastic gradient descent to learn a hyperplane that separates normal data from the origin in feature space.
Description
The One-Class SVM is an unsupervised anomaly detection algorithm originally proposed by Scholkopf et al. (2001). The classical batch algorithm finds a hyperplane that maximally separates the training data from the origin in a high-dimensional feature space (often induced by a kernel). Data points that fall on the "wrong" side of the hyperplane -- closer to the origin -- are considered anomalies.
In the online (streaming) variant implemented in River, the optimization is performed via Stochastic Gradient Descent (SGD) rather than solving a quadratic program over the full dataset. Each incoming observation triggers a single gradient step, making it suitable for data streams.
The key parameter is nu (the Greek letter), which serves a dual purpose:
- It is an upper bound on the fraction of training errors (outliers).
- It is a lower bound on the fraction of support vectors.
In practice, nu can be interpreted as the expected fraction of anomalies in the data.
The online One-Class SVM uses the hinge loss and L2 regularization (with weight proportional to nu/2). The decision function value for an observation indicates how far it is from the separating hyperplane: lower values indicate more anomalous observations. This is different from ensemble-based methods like Half-Space Trees, where higher scores indicate anomalies.
For non-linear anomaly detection, the online One-Class SVM can be combined with feature_extraction.RBFSampler to approximate kernel mappings.
Usage
Use the online One-Class SVM when:
- You need a principled unsupervised anomaly detector based on the SVM framework
- You can estimate the expected fraction of anomalies (to set nu)
- You want to combine the detector with non-linear feature mappings (e.g., RBFSampler)
- You are comfortable with raw decision function scores (not bounded to [0, 1])
- Data arrives as a stream and batch SVM training is not feasible
Theoretical Basis
Reference: Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C., 2001. "Estimating the Support of a High-Dimensional Distribution." Neural Computation, 13(7), pp.1443-1471.
Objective (batch formulation):
min_{w, rho, xi} (1/2)||w||^2 + (1/(nu*n)) * sum(xi_i) - rho
subject to:
w . phi(x_i) >= rho - xi_i for all i
xi_i >= 0 for all i
Where:
- w is the weight vector
- rho is the offset (intercept)
- xi_i are slack variables
- nu controls the tradeoff between maximizing the margin and allowing training errors
- phi(x) is the feature map (identity in the linear case)
Online SGD formulation:
In River's implementation, the optimization is performed via SGD with:
- Loss: Hinge loss
- Regularization: L2 with coefficient
nu / 2 - Target: All observations are treated as positive (y=1)
- Default optimizer: SGD with learning rate 0.01
INIT:
w = zeros # weight vector
intercept = 1.0 # initial intercept
LEARN_ONE(x):
# Treat x as a positive example (y=1)
gradient = hinge_loss_gradient(w . x - intercept, y=1)
w = w - lr * (gradient + nu * w) # SGD step with L2 regularization
intercept = intercept - lr_intercept * (gradient + nu * intercept)
SCORE_ONE(x):
return w . x - intercept
# Lower values = more anomalous
Decision function interpretation:
- Positive values: observation is on the "normal" side of the hyperplane
- Negative or low values: observation is anomalous
- The raw score is not bounded to [0, 1]; use a QuantileFilter or ThresholdFilter to convert to binary decisions