Principle:Online ml River Anomaly Threshold Filtering
| Knowledge Sources | River River Docs |
|---|---|
| Domains | Online Machine Learning, Anomaly Detection, Binary Classification |
| Last Updated | 2026-02-08 16:00 GMT |
Overview
Binary classification wrapper that converts continuous anomaly scores into binary anomaly/normal labels using a fixed score threshold, with optional protection of the underlying detector from learning on flagged anomalies.
Description
Anomaly Threshold Filtering is a simple yet fundamental pattern in streaming anomaly detection. Anomaly detectors produce continuous scores, but downstream systems typically need binary decisions: is this observation anomalous or not? A threshold filter provides this conversion by comparing the anomaly score against a fixed, user-defined threshold.
The threshold filter wraps any anomaly detector that implements the score_one interface. Given an anomaly score s and a threshold t:
- If s >= t, the observation is classified as anomalous (True).
- If s < t, the observation is classified as normal (False).
A key design feature is the protect_anomaly_detector option. When enabled (the default), the wrapped anomaly detector is not updated with observations that are classified as anomalous. This prevents the detector from "learning" anomalous patterns and gradually treating them as normal. This protection is crucial in production scenarios where sporadic anomalies should not corrupt the model's understanding of normal behavior.
However, in cases where concept drift is expected and the detector should adapt to changing distributions (including what was previously anomalous becoming normal), protection can be disabled.
Usage
Use anomaly threshold filtering when:
- You need to convert continuous anomaly scores to binary labels
- You have a known, fixed threshold that defines the anomaly boundary
- You want to protect the anomaly detector from learning on flagged anomalies
- You need a simple, interpretable decision rule for anomaly classification
- You want to use an anomaly filter as part of a pipeline (e.g., to filter anomalous observations before feeding them to a supervised model)
Theoretical Basis
Binary classification rule:
classify(score, threshold):
if score >= threshold:
return True # anomaly
else:
return False # normal
Protected learning:
LEARN_ONE(x):
score = anomaly_detector.score_one(x)
if protect_anomaly_detector AND classify(score, threshold) == True:
# Do NOT update the detector -- observation is anomalous
pass
else:
anomaly_detector.learn_one(x)
Deployment pattern:
1. score = model.score_one(x) # Get anomaly score
2. is_anomaly = filter.classify(score) # Apply threshold
3. if is_anomaly: ALERT # Take action
4. filter.learn_one(x) # Update (with protection)
Choosing the threshold:
The threshold must be chosen based on domain knowledge or by analyzing the score distribution on a validation set. Key considerations:
- Higher threshold = fewer false positives, more false negatives (conservative)
- Lower threshold = more false positives, fewer false negatives (aggressive)
- The threshold is fixed -- it does not adapt to the score distribution (unlike the quantile-based approach)