Principle:Online ml River Page Hinkley Drift Detection
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs Continuous Inspection Schemes | Online Machine Learning, Concept Drift Detection, Sequential Analysis | 2026-02-08 16:00 GMT |
Overview
The Page-Hinkley test is a sequential analysis method for detecting changes in the mean of a signal using cumulative sum monitoring with a threshold.
Description
The Page-Hinkley test is a classic sequential hypothesis testing method originally proposed for quality control in manufacturing (Page, 1954). In the context of online machine learning, it monitors a stream of values (typically prediction errors or loss values) and detects when the mean of the signal has shifted significantly.
The test works by maintaining a cumulative sum of deviations between observed values and their running mean. When this cumulative sum exceeds a predefined threshold, a concept drift is flagged. The implementation supports detecting upward shifts, downward shifts, or both simultaneously.
A forgetting factor (alpha) applies exponential weighting to the cumulative sums, giving more importance to recent observations. This prevents the detector from being overly influenced by very old data that may no longer be relevant.
Unlike ADWIN, the Page-Hinkley test does not signal warning zones -- it only signals full drift detections.
Usage
Use Page-Hinkley drift detection when:
- You need a simple, computationally efficient sequential change detection method.
- You want to detect changes in the mean of a monitored signal (e.g., error rate, loss).
- You need explicit control over detection sensitivity via the threshold and delta parameters.
- You want to monitor for specific directional changes (upward only, downward only, or both).
Theoretical Basis
The Page-Hinkley test monitors the cumulative difference between observed values and their running mean. The implementation uses a two-sided CUSUM (cumulative sum) variant with exponential forgetting.
Given a stream of values , the test maintains:
Running mean:
Cumulative sums for increase and decrease detection:
where is the forgetting factor (default 0.9999) and is the magnitude allowance (default 0.005).
Tracking minimum and maximum cumulative sums:
Drift detection tests:
Drift is detected when (for upward shifts), (for downward shifts), or either (for both), where is the detection threshold.
Page-Hinkley Test:
Initialize: S+ = 0, S- = 0, min+ = inf, max- = -1, x_mean = Mean()
Parameters: min_instances, delta, threshold (lambda), alpha, mode
For each new value x:
1. Update running mean: x_mean.update(x)
2. dev = x - x_mean
3. S+ = alpha * S+ + dev - delta
4. S- = alpha * S- + dev + delta
5. min+ = min(min+, S+)
6. max- = max(max-, S-)
7. If n >= min_instances:
T+ = S+ - min+
T- = max- - S-
If mode == "up": drift = (T+ > threshold)
If mode == "down": drift = (T- > threshold)
If mode == "both": drift = (T+ > threshold) OR (T- > threshold)
Properties:
- Memory: -- only stores running statistics.
- Directional control: Can be configured to detect increases, decreases, or both.
- No warning zone: Only signals drift detections, not warnings.
- Forgetting factor: The
alphaparameter provides exponential decay, preventing stale data from dominating.