Principle:Online ml River ADWIN Drift Detection

Knowledge Sources	Domains	Last Updated
River River Docs Learning from Time-Changing Data with Adaptive Windowing	Online Machine Learning, Concept Drift Detection, Statistical Testing	2026-02-08 16:00 GMT

Overview

ADWIN (ADaptive WINdowing) is an adaptive windowing algorithm that detects concept drift by maintaining a variable-length window and comparing sub-window means using statistical bounds.

Description

ADWIN is one of the most widely used drift detection methods in online machine learning. It works by maintaining a window $W$ of recent data values (typically error indicators) and efficiently testing whether any partition of this window into two consecutive sub-windows reveals a statistically significant difference in their means. When such a difference is found, the oldest portion of the window is dropped, effectively "forgetting" data from before the detected drift point.

The key advantage of ADWIN over fixed-window approaches is that its window size adapts automatically: it grows during stable periods to accumulate statistical power, and shrinks when drift is detected to quickly adapt to the new distribution. This gives ADWIN mathematical guarantees on both false positive and false negative rates.

ADWIN uses a bucket-based compression scheme (from ADWIN2, the improved version presented in the original paper) that keeps memory usage logarithmic in the window size, making it practical for long-running streams.

Usage

Use ADWIN drift detection when:

You need a drift detector with formal statistical guarantees on detection accuracy.
You want a parameter-light method (primarily the confidence parameter $δ$ ).
You need to monitor error rates, accuracy, or any scalar metric for distributional changes.
You are building drift-adaptive models (e.g., Hoeffding Adaptive Tree, Adaptive Random Forest) that use ADWIN internally.

Theoretical Basis

ADWIN maintains a sliding window $W$ of the most recent $n$ values. At each step, it checks whether any partition of $W$ into two consecutive sub-windows $W_{0}$ (older) and $W_{1}$ (newer) exhibits a statistically significant difference in means.

For sub-windows of sizes $n_{0}$ and $n_{1}$ with sample means ${\hat{μ}}_{0}$ and ${\hat{μ}}_{1}$ , the test checks:

$| {\hat{μ}}_{0} - {\hat{μ}}_{1} | \geq ϵ_{cut}$

where the threshold $ϵ_{cut}$ is derived from the Hoeffding bound:

$ϵ_{cut} = \sqrt{\frac{1}{2 m} \cdot \ln \frac{4}{δ^{'}}}$

where $m = \frac{1}{1 / n_{0} + 1 / n_{1}}$ is the harmonic mean of the sub-window sizes, and $δ^{'}$ is a corrected confidence parameter derived from the user-specified $δ$ .

The algorithm pseudocode:

ADWIN(delta, clock, max_buckets):
    Initialize window W as empty
    For each new value x:
        1. Add x to W (insert into appropriate bucket)
        2. Update running statistics (total, variance, width)
        3. Every 'clock' steps:
           For each possible partition of W into (W0, W1):
               Compute epsilon_cut from Hoeffding bound
               If |mean(W0) - mean(W1)| >= epsilon_cut:
                   Remove oldest elements (shrink W to W1)
                   Signal drift_detected = True
                   Break
        4. Compress buckets if count exceeds max_buckets

Bucket compression: Instead of storing every individual value, ADWIN groups values into exponentially growing buckets. Each bucket stores the count, sum, and variance of its contained values. Adjacent buckets of the same size are merged when the number of same-size buckets exceeds max_buckets. This yields $O (\log W)$ memory complexity.

Properties:

False positive rate: Bounded by $δ$ at any time step.
False negative rate: If the true means differ by more than $ϵ_{cut}$ , drift will be detected with high probability.
Memory: $O (\log W)$ where $W$ is the window length.
Time per update: Amortized $O (1)$ with the clock parameter controlling check frequency.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment