Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Online ml River Anomaly Quantile Filtering

From Leeroopedia


Knowledge Sources River River Docs
Domains Online Machine Learning, Anomaly Detection, Adaptive Thresholding
Last Updated 2026-02-08 16:00 GMT

Overview

Adaptive binary classification wrapper that converts continuous anomaly scores to binary anomaly/normal labels using a dynamically computed quantile threshold, automatically adapting to changing score distributions.

Description

Anomaly Quantile Filtering addresses a fundamental limitation of fixed-threshold anomaly filtering: the score distribution of an anomaly detector may change over time as the detector learns, as the data distribution drifts, or as the nature of anomalies evolves. A fixed threshold that works well initially may become too aggressive or too lenient over time.

The quantile filter solves this by maintaining a streaming quantile estimate of the anomaly score distribution. Instead of classifying based on a fixed score value, it classifies an observation as anomalous if its score exceeds the q-th quantile of all scores observed so far. This means the threshold automatically adapts to the evolving score distribution.

For example, with q=0.95, the filter will flag approximately the top 5% of observations (by anomaly score) as anomalous, regardless of the absolute score values. This provides a more robust and adaptive anomaly detection strategy.

Like the fixed-threshold filter, the quantile filter supports anomaly detector protection: when enabled, the wrapped detector is not updated with observations classified as anomalous, preventing contamination of the normal model.

A key difference in the quantile filter's learning behavior: the quantile statistic itself is always updated with the score, even when the wrapped detector is protected. This ensures the quantile estimate remains accurate even as anomalies are observed.

Usage

Use anomaly quantile filtering when:

  • You do not know the appropriate fixed threshold in advance
  • The anomaly score distribution may shift over time
  • You want to flag a fixed proportion (e.g., top 5%) of observations as anomalous
  • You need an adaptive threshold that self-calibrates
  • You want to combine the filter with any anomaly detector (Half-Space Trees, OneClassSVM, etc.)

Theoretical Basis

Streaming quantile estimation:

The quantile filter uses River's stats.Quantile to maintain an online estimate of the q-th quantile of the score distribution. This uses an incremental quantile estimation algorithm that does not require storing all observed scores.

Classification rule:

classify(score):
    quantile_threshold = quantile_estimator.get()
    if quantile_threshold is None:
        return False        # no data yet; cannot classify
    if score >= quantile_threshold:
        return True         # anomaly
    else:
        return False        # normal

Note: When the quantile estimate is not yet available (no data observed), quantile.get() returns None, and the filter uses math.inf as the threshold, meaning no observation is classified as anomalous until sufficient data is collected.

Protected learning with quantile update:

LEARN_ONE(x):
    score = anomaly_detector.score_one(x)
    if protect_anomaly_detector AND classify(score) == True:
        # Do NOT update the detector
        pass
    else:
        anomaly_detector.learn_one(x)
    # Always update the quantile estimate
    quantile_estimator.update(score)

Adaptive behavior:

  • With q = 0.95, approximately 5% of observations will be classified as anomalous
  • The threshold automatically adjusts as the score distribution changes
  • Early observations may have an unstable threshold until enough scores are collected
  • The quantile estimate converges as more data is observed

Comparison with fixed threshold:

Property Fixed Threshold Quantile Threshold
Threshold value Constant Adapts to score distribution
Domain knowledge required Yes (must know score range) No (just specify desired quantile)
Adapts to drift No Yes
Anomaly fraction Unpredictable Approximately 1-q
Warm-up period None Needs some observations

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment