Implementation:Online ml River Drift PageHinkley
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs Continuous Inspection Schemes | Online Machine Learning, Concept Drift Detection, Sequential Analysis | 2026-02-08 16:00 GMT |
Overview
Concrete tool for detecting concept drift using the Page-Hinkley sequential analysis test, which monitors cumulative deviations from the running mean and signals drift when a threshold is exceeded.
Description
The drift.PageHinkley class implements a two-sided CUSUM control chart for change detection in streaming data. It monitors the cumulative sum of deviations between observed values and their running mean (tracked via stats.Mean). The forgetting factor alpha applies exponential weighting to prevent old observations from dominating. Drift is detected when the difference between the cumulative sum and its historical minimum (for upward shifts) or maximum (for downward shifts) exceeds the threshold parameter.
The detector supports three modes: "up" for detecting increases only, "down" for decreases only, and "both" for either direction. Unlike ADWIN, Page-Hinkley does not provide warning detection -- only drift signals.
Usage
Import drift.PageHinkley when you need a lightweight, constant-memory drift detector for monitoring mean shifts in a scalar signal. It is particularly useful when you need directional control over drift detection.
Code Reference
Source Location
river/drift/page_hinkley.py:L7-L128
Signature
class PageHinkley(DriftDetector):
def __init__(
self,
min_instances: int = 30,
delta: float = 0.005,
threshold: float = 50.0,
alpha: float = 1 - 0.0001, # 0.9999
mode: str = "both",
)
Import
from river import drift
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
min_instances |
int | 30 | Minimum number of observations before drift detection begins |
delta |
float | 0.005 | Magnitude allowance parameter. Controls the minimum detectable change size |
threshold |
float | 50.0 | Detection threshold (lambda). Drift is flagged when the cumulative test statistic exceeds this value |
alpha |
float | 0.9999 | Forgetting factor for exponential weighting. Values closer to 1 give more weight to historical data |
mode |
str | "both" | Direction of change to detect: "up" (increases), "down" (decreases), or "both" |
I/O Contract
Inputs
| Method | Parameter | Type | Description |
|---|---|---|---|
update |
x | float | A single numeric value (e.g., classification error, loss value) |
Outputs
| Property/Method | Return Type | Description |
|---|---|---|
drift_detected |
bool | True if drift was detected on the most recent update call
|
Usage Examples
Basic Drift Detection
import random
from river import drift
rng = random.Random(12345)
ph = drift.PageHinkley()
# Simulate a data stream with a distribution change at index 1000
data_stream = rng.choices([0, 1], k=1000) + rng.choices(range(4, 8), k=1000)
for i, val in enumerate(data_stream):
ph.update(val)
if ph.drift_detected:
print(f"Change detected at index {i}, input value: {val}")
# Change detected at index 1006, input value: 5
Detecting Only Upward Shifts
from river import drift
ph_up = drift.PageHinkley(mode="up", threshold=30.0)
for i, val in enumerate(data_stream):
ph_up.update(val)
if ph_up.drift_detected:
print(f"Upward shift detected at step {i}")
Tuning Sensitivity
from river import drift
# More sensitive (lower threshold, smaller delta)
sensitive_ph = drift.PageHinkley(threshold=20.0, delta=0.001, min_instances=10)
# Less sensitive (higher threshold, larger delta)
conservative_ph = drift.PageHinkley(threshold=100.0, delta=0.01, min_instances=50)
Monitoring Classification Errors
from river import drift, tree, datasets
model = tree.HoeffdingTreeClassifier()
ph = drift.PageHinkley()
for x, y in datasets.Elec2().take(5000):
y_pred = model.predict_one(x)
if y_pred is not None:
error = int(y_pred != y)
ph.update(error)
if ph.drift_detected:
print("Drift detected, consider resetting the model")
model.learn_one(x, y)