Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Online ml River Online One Class SVM

From Leeroopedia


Knowledge Sources River River Docs Estimating the Support of a High-Dimensional Distribution
Domains Online Machine Learning, Anomaly Detection, Support Vector Machines
Last Updated 2026-02-08 16:00 GMT

Overview

Online variant of the One-Class Support Vector Machine for unsupervised anomaly detection, using stochastic gradient descent to learn a hyperplane that separates normal data from the origin in feature space.

Description

The One-Class SVM is an unsupervised anomaly detection algorithm originally proposed by Scholkopf et al. (2001). The classical batch algorithm finds a hyperplane that maximally separates the training data from the origin in a high-dimensional feature space (often induced by a kernel). Data points that fall on the "wrong" side of the hyperplane -- closer to the origin -- are considered anomalies.

In the online (streaming) variant implemented in River, the optimization is performed via Stochastic Gradient Descent (SGD) rather than solving a quadratic program over the full dataset. Each incoming observation triggers a single gradient step, making it suitable for data streams.

The key parameter is nu (the Greek letter), which serves a dual purpose:

  • It is an upper bound on the fraction of training errors (outliers).
  • It is a lower bound on the fraction of support vectors.

In practice, nu can be interpreted as the expected fraction of anomalies in the data.

The online One-Class SVM uses the hinge loss and L2 regularization (with weight proportional to nu/2). The decision function value for an observation indicates how far it is from the separating hyperplane: lower values indicate more anomalous observations. This is different from ensemble-based methods like Half-Space Trees, where higher scores indicate anomalies.

For non-linear anomaly detection, the online One-Class SVM can be combined with feature_extraction.RBFSampler to approximate kernel mappings.

Usage

Use the online One-Class SVM when:

  • You need a principled unsupervised anomaly detector based on the SVM framework
  • You can estimate the expected fraction of anomalies (to set nu)
  • You want to combine the detector with non-linear feature mappings (e.g., RBFSampler)
  • You are comfortable with raw decision function scores (not bounded to [0, 1])
  • Data arrives as a stream and batch SVM training is not feasible

Theoretical Basis

Reference: Scholkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J. and Williamson, R.C., 2001. "Estimating the Support of a High-Dimensional Distribution." Neural Computation, 13(7), pp.1443-1471.

Objective (batch formulation):

min_{w, rho, xi}  (1/2)||w||^2 + (1/(nu*n)) * sum(xi_i) - rho

subject to:
    w . phi(x_i) >= rho - xi_i    for all i
    xi_i >= 0                      for all i

Where:

  • w is the weight vector
  • rho is the offset (intercept)
  • xi_i are slack variables
  • nu controls the tradeoff between maximizing the margin and allowing training errors
  • phi(x) is the feature map (identity in the linear case)

Online SGD formulation:

In River's implementation, the optimization is performed via SGD with:

  • Loss: Hinge loss
  • Regularization: L2 with coefficient nu / 2
  • Target: All observations are treated as positive (y=1)
  • Default optimizer: SGD with learning rate 0.01
INIT:
    w = zeros       # weight vector
    intercept = 1.0 # initial intercept

LEARN_ONE(x):
    # Treat x as a positive example (y=1)
    gradient = hinge_loss_gradient(w . x - intercept, y=1)
    w = w - lr * (gradient + nu * w)  # SGD step with L2 regularization
    intercept = intercept - lr_intercept * (gradient + nu * intercept)

SCORE_ONE(x):
    return w . x - intercept
    # Lower values = more anomalous

Decision function interpretation:

  • Positive values: observation is on the "normal" side of the hyperplane
  • Negative or low values: observation is anomalous
  • The raw score is not bounded to [0, 1]; use a QuantileFilter or ThresholdFilter to convert to binary decisions

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment