Principle:Online ml River Prediction Anomaly Detection
| Knowledge Sources | Machine Learning Time Series Anomaly Detection |
|---|---|
| Domains | Online_Learning Anomaly_Detection Time_Series |
| Last Updated | 2026-02-08 18:00 GMT |
Overview
Prediction-error-based anomaly detection identifies anomalies by measuring the discrepancy between a predictive model's expected output and the actual observed value. Large prediction errors signal that an observation does not conform to the learned pattern, making it a candidate anomaly.
Description
This approach uses a supervised wrapper around any predictive model (classifier or regressor) to detect anomalies. The core idea is simple: if a model that has learned the normal behavior of a system produces a large error on a new observation, that observation is likely anomalous.
Two common variants exist:
- Prediction-based Anomaly Detection (PAD): Uses the absolute or squared prediction error as the anomaly score. Applicable to both classification and regression tasks.
- Streaming Anomaly Detection (SAD): Uses the absolute difference between predicted and actual values, often combined with a running statistic (e.g., mean or standard deviation) of recent errors to normalize the score.
The key advantage of this approach is its modularity: any online learning model can be wrapped to become an anomaly detector, leveraging the model's existing knowledge of normal patterns.
Usage
Use prediction-error-based anomaly detection when:
- You have a supervised or semi-supervised signal (features and targets) and want to detect anomalous target values.
- You want to detect concept drift or distribution shift as a side effect.
- You have an existing predictive model and want to add anomaly detection without building a separate detector.
- The notion of "anomaly" corresponds to unexpected prediction errors.
Theoretical Basis
Given a predictive model trained on a stream, the anomaly score for observation is:
Absolute error scoring:
score(x_t, y_t) = |y_t - f(x_t)|
Normalized scoring: To make scores comparable across different scales, the error can be normalized by a running estimate of the error distribution:
z_t = (|y_t - f(x_t)| - mu_t) / sigma_t
Where and are the running mean and standard deviation of recent prediction errors.
For classification: The anomaly score can be derived from the predicted probability of the true class:
score(x_t, y_t) = 1 - P(y_t | x_t)
A perfectly predicted instance scores 0, while an instance whose true class was assigned near-zero probability scores close to 1.
Update rule: After scoring, the model is updated with the new observation:
1. score_t = error(f(x_t), y_t)
2. f.learn_one(x_t, y_t)
This ordering ensures the anomaly score reflects genuine surprise rather than post-hoc fitting.