Principle:Online ml River Prediction Anomaly Detection

Knowledge Sources	Machine Learning Time Series Anomaly Detection
Domains	Online_Learning Anomaly_Detection Time_Series
Last Updated	2026-02-08 18:00 GMT

Overview

Prediction-error-based anomaly detection identifies anomalies by measuring the discrepancy between a predictive model's expected output and the actual observed value. Large prediction errors signal that an observation does not conform to the learned pattern, making it a candidate anomaly.

Description

This approach uses a supervised wrapper around any predictive model (classifier or regressor) to detect anomalies. The core idea is simple: if a model that has learned the normal behavior of a system produces a large error on a new observation, that observation is likely anomalous.

Two common variants exist:

Prediction-based Anomaly Detection (PAD): Uses the absolute or squared prediction error as the anomaly score. Applicable to both classification and regression tasks.
Streaming Anomaly Detection (SAD): Uses the absolute difference between predicted and actual values, often combined with a running statistic (e.g., mean or standard deviation) of recent errors to normalize the score.

The key advantage of this approach is its modularity: any online learning model can be wrapped to become an anomaly detector, leveraging the model's existing knowledge of normal patterns.

Usage

Use prediction-error-based anomaly detection when:

You have a supervised or semi-supervised signal (features and targets) and want to detect anomalous target values.
You want to detect concept drift or distribution shift as a side effect.
You have an existing predictive model and want to add anomaly detection without building a separate detector.
The notion of "anomaly" corresponds to unexpected prediction errors.

Theoretical Basis

Given a predictive model $f$ trained on a stream, the anomaly score for observation $(x_{t}, y_{t})$ is:

Absolute error scoring:

score(x_t, y_t) = |y_t - f(x_t)|

Normalized scoring: To make scores comparable across different scales, the error can be normalized by a running estimate of the error distribution:

z_t = (|y_t - f(x_t)| - mu_t) / sigma_t

Where $μ_{t}$ and $σ_{t}$ are the running mean and standard deviation of recent prediction errors.

For classification: The anomaly score can be derived from the predicted probability of the true class:

score(x_t, y_t) = 1 - P(y_t | x_t)

A perfectly predicted instance scores 0, while an instance whose true class was assigned near-zero probability scores close to 1.

Update rule: After scoring, the model is updated with the new observation:

1. score_t = error(f(x_t), y_t)
2. f.learn_one(x_t, y_t)

This ordering ensures the anomaly score reflects genuine surprise rather than post-hoc fitting.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment