Principle:Online ml River Anomaly Threshold Filtering

Knowledge Sources	River River Docs
Domains	Online Machine Learning, Anomaly Detection, Binary Classification
Last Updated	2026-02-08 16:00 GMT

Overview

Binary classification wrapper that converts continuous anomaly scores into binary anomaly/normal labels using a fixed score threshold, with optional protection of the underlying detector from learning on flagged anomalies.

Description

Anomaly Threshold Filtering is a simple yet fundamental pattern in streaming anomaly detection. Anomaly detectors produce continuous scores, but downstream systems typically need binary decisions: is this observation anomalous or not? A threshold filter provides this conversion by comparing the anomaly score against a fixed, user-defined threshold.

The threshold filter wraps any anomaly detector that implements the score_one interface. Given an anomaly score s and a threshold t:

If s >= t, the observation is classified as anomalous (True).
If s < t, the observation is classified as normal (False).

A key design feature is the protect_anomaly_detector option. When enabled (the default), the wrapped anomaly detector is not updated with observations that are classified as anomalous. This prevents the detector from "learning" anomalous patterns and gradually treating them as normal. This protection is crucial in production scenarios where sporadic anomalies should not corrupt the model's understanding of normal behavior.

However, in cases where concept drift is expected and the detector should adapt to changing distributions (including what was previously anomalous becoming normal), protection can be disabled.

Usage

Use anomaly threshold filtering when:

You need to convert continuous anomaly scores to binary labels
You have a known, fixed threshold that defines the anomaly boundary
You want to protect the anomaly detector from learning on flagged anomalies
You need a simple, interpretable decision rule for anomaly classification
You want to use an anomaly filter as part of a pipeline (e.g., to filter anomalous observations before feeding them to a supervised model)

Theoretical Basis

Binary classification rule:

classify(score, threshold):
    if score >= threshold:
        return True     # anomaly
    else:
        return False    # normal

Protected learning:

LEARN_ONE(x):
    score = anomaly_detector.score_one(x)
    if protect_anomaly_detector AND classify(score, threshold) == True:
        # Do NOT update the detector -- observation is anomalous
        pass
    else:
        anomaly_detector.learn_one(x)

Deployment pattern:

1. score = model.score_one(x)          # Get anomaly score
2. is_anomaly = filter.classify(score)  # Apply threshold
3. if is_anomaly: ALERT                 # Take action
4. filter.learn_one(x)                  # Update (with protection)

Choosing the threshold:

The threshold must be chosen based on domain knowledge or by analyzing the score distribution on a validation set. Key considerations:

Higher threshold = fewer false positives, more false negatives (conservative)
Lower threshold = more false positives, fewer false negatives (aggressive)
The threshold is fixed -- it does not adapt to the score distribution (unlike the quantile-based approach)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment