Principle:Online ml River KSWIN Drift Detection
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs KSWIN - Kolmogorov-Smirnov Windowing Method for Drift Detection | Online Machine Learning, Concept Drift Detection, Non-Parametric Testing | 2026-02-08 16:00 GMT |
Overview
KSWIN is a non-parametric drift detection method that uses the Kolmogorov-Smirnov two-sample test to compare recent and reference data windows for distributional changes.
Description
KSWIN (Kolmogorov-Smirnov Windowing) detects concept drift by maintaining a fixed-size sliding window that is split into two sub-windows: a larger reference window and a smaller recent (statistic) window. At each step, it draws a random sample from the reference portion and applies the Kolmogorov-Smirnov two-sample test to determine whether the recent and reference windows come from the same distribution.
Unlike ADWIN, which primarily detects changes in the mean, KSWIN is distribution-free -- it can detect any type of distributional change, including changes in variance, skewness, or modality. This makes it particularly useful when drift manifests as changes beyond simple mean shifts.
Usage
Use KSWIN drift detection when:
- You need to detect distributional changes that go beyond mean shifts (e.g., variance changes, shape changes).
- You want a non-parametric test with no assumptions about the underlying data distribution.
- You prefer a fixed-memory approach with a bounded sliding window size.
- You are monitoring a scalar signal where the full distribution matters, not just the first moment.
Theoretical Basis
KSWIN maintains a sliding window of fixed size (window_size). The window is partitioned as follows:
- Reference window: The first elements of , from which elements are uniformly sampled to form window .
- Recent window : The last elements of (
stat_size).
The Kolmogorov-Smirnov two-sample test is applied to windows and (both of size ). The KS test statistic measures the maximum distance between the empirical cumulative distribution functions (ECDFs):
where and are the empirical CDFs of the recent and reference windows respectively.
A concept drift is detected when:
where is the significance level. In practice, the implementation uses scipy.stats.ks_2samp which computes the exact p-value, and drift is flagged when:
p_value <= alpha AND D_n > 0.1
The additional threshold prevents triggering on negligibly small distributional differences that happen to be statistically significant.
Algorithm pseudocode:
KSWIN(alpha, window_size=n, stat_size=r):
Initialize sliding window Psi of max size n
For each new value x:
1. Append x to Psi
2. If len(Psi) >= n:
a. R = last r elements of Psi (recent window)
b. W = random sample of r elements from first (n-r) elements
c. (statistic, p_value) = KS_two_sample_test(R, W)
d. If p_value <= alpha AND statistic > 0.1:
drift_detected = True
Psi = R (keep only recent window)
Else:
drift_detected = False
Properties:
- Distribution-free: No assumption about the underlying data distribution.
- Memory: Fixed where is the window size.
- Detects: Any type of distributional change (mean, variance, shape, modality).
- External dependency: Requires
scipy.statsfor the KS test computation.