Principle:Online ml River Anomaly Quantile Filtering
| Knowledge Sources | River River Docs |
|---|---|
| Domains | Online Machine Learning, Anomaly Detection, Adaptive Thresholding |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Adaptive binary classification wrapper that converts continuous anomaly scores to binary anomaly/normal labels using a dynamically computed quantile threshold, automatically adapting to changing score distributions.
Description
Anomaly Quantile Filtering addresses a fundamental limitation of fixed-threshold anomaly filtering: the score distribution of an anomaly detector may change over time as the detector learns, as the data distribution drifts, or as the nature of anomalies evolves. A fixed threshold that works well initially may become too aggressive or too lenient over time.
The quantile filter solves this by maintaining a streaming quantile estimate of the anomaly score distribution. Instead of classifying based on a fixed score value, it classifies an observation as anomalous if its score exceeds the q-th quantile of all scores observed so far. This means the threshold automatically adapts to the evolving score distribution.
For example, with q=0.95, the filter will flag approximately the top 5% of observations (by anomaly score) as anomalous, regardless of the absolute score values. This provides a more robust and adaptive anomaly detection strategy.
Like the fixed-threshold filter, the quantile filter supports anomaly detector protection: when enabled, the wrapped detector is not updated with observations classified as anomalous, preventing contamination of the normal model.
A key difference in the quantile filter's learning behavior: the quantile statistic itself is always updated with the score, even when the wrapped detector is protected. This ensures the quantile estimate remains accurate even as anomalies are observed.
Usage
Use anomaly quantile filtering when:
- You do not know the appropriate fixed threshold in advance
- The anomaly score distribution may shift over time
- You want to flag a fixed proportion (e.g., top 5%) of observations as anomalous
- You need an adaptive threshold that self-calibrates
- You want to combine the filter with any anomaly detector (Half-Space Trees, OneClassSVM, etc.)
Theoretical Basis
Streaming quantile estimation:
The quantile filter uses River's stats.Quantile to maintain an online estimate of the q-th quantile of the score distribution. This uses an incremental quantile estimation algorithm that does not require storing all observed scores.
Classification rule:
classify(score):
quantile_threshold = quantile_estimator.get()
if quantile_threshold is None:
return False # no data yet; cannot classify
if score >= quantile_threshold:
return True # anomaly
else:
return False # normal
Note: When the quantile estimate is not yet available (no data observed), quantile.get() returns None, and the filter uses math.inf as the threshold, meaning no observation is classified as anomalous until sufficient data is collected.
Protected learning with quantile update:
LEARN_ONE(x):
score = anomaly_detector.score_one(x)
if protect_anomaly_detector AND classify(score) == True:
# Do NOT update the detector
pass
else:
anomaly_detector.learn_one(x)
# Always update the quantile estimate
quantile_estimator.update(score)
Adaptive behavior:
- With q = 0.95, approximately 5% of observations will be classified as anomalous
- The threshold automatically adjusts as the score distribution changes
- Early observations may have an unstable threshold until enough scores are collected
- The quantile estimate converges as more data is observed
Comparison with fixed threshold:
| Property | Fixed Threshold | Quantile Threshold |
|---|---|---|
| Threshold value | Constant | Adapts to score distribution |
| Domain knowledge required | Yes (must know score range) | No (just specify desired quantile) |
| Adapts to drift | No | Yes |
| Anomaly fraction | Unpredictable | Approximately 1-q |
| Warm-up period | None | Needs some observations |