Implementation:Online ml River Drift KSWIN
| Knowledge Sources | Domains | Last Updated |
|---|---|---|
| River River Docs KSWIN - Kolmogorov-Smirnov Windowing Method for Drift Detection | Online Machine Learning, Concept Drift Detection, Non-Parametric Testing | 2026-02-08 16:00 GMT |
Overview
Concrete tool for detecting concept drift using the Kolmogorov-Smirnov two-sample test on a fixed-size sliding window, capable of detecting any type of distributional change without parametric assumptions.
Description
The drift.KSWIN class implements the Kolmogorov-Smirnov Windowing method for concept drift detection. It maintains a sliding window of fixed size (window_size) implemented as a collections.deque. The last stat_size elements form the recent window, and a random sample of stat_size elements is drawn from the remaining older elements to form the reference window. The two-sample KS test from scipy.stats.ks_2samp is applied to these sub-windows. Drift is flagged when the p-value falls below alpha and the KS statistic exceeds 0.1. Upon drift detection, the window is reset to contain only the recent sub-window.
Usage
Import drift.KSWIN when you need a non-parametric drift detector that can identify any type of distributional change (not just mean shifts). It requires scipy as an external dependency.
Code Reference
Source Location
river/drift/kswin.py:L14-L162
Signature
class KSWIN(DriftDetector):
def __init__(
self,
alpha: float = 0.005,
window_size: int = 100,
stat_size: int = 30,
seed: int | None = None,
window: typing.Iterable | None = None,
)
Import
from river import drift
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
alpha |
float | 0.005 | Significance level for the KS test. Must be between 0 and 1. Should be set below 0.01 for best results |
window_size |
int | 100 | Total size of the sliding window |
stat_size |
int | 30 | Size of the recent sub-window used for the KS test. Must be smaller than window_size
|
seed |
int or None | None | Random seed for reproducibility of the random sampling from the reference window |
window |
Iterable or None | None | Pre-collected data to initialize the window (avoids cold start) |
External Dependencies
scipy.stats-- used for theks_2samptwo-sample Kolmogorov-Smirnov test
I/O Contract
Inputs
| Method | Parameter | Type | Description |
|---|---|---|---|
update |
x | float | A single numeric value to add to the sliding window |
Outputs
| Property/Method | Return Type | Description |
|---|---|---|
drift_detected |
bool | True if drift was detected on the most recent update call
|
p_value |
float | The p-value from the most recent KS test (0 if insufficient data) |
n |
int | Total number of samples processed |
Usage Examples
Basic Drift Detection
import random
from river import drift
rng = random.Random(12345)
kswin = drift.KSWIN(alpha=0.0001, seed=42)
# Simulate a data stream with a distribution change at index 1000
data_stream = rng.choices([0, 1], k=1000) + rng.choices(range(4, 8), k=1000)
for i, val in enumerate(data_stream):
kswin.update(val)
if kswin.drift_detected:
print(f"Change detected at index {i}, input value: {val}")
# Change detected at index 1016, input value: 6
Detecting Variance Changes
import random
from river import drift
rng = random.Random(42)
kswin = drift.KSWIN(alpha=0.001, window_size=200, stat_size=50, seed=42)
# Same mean but different variance
data_stream = [rng.gauss(5, 1) for _ in range(1000)] + [rng.gauss(5, 5) for _ in range(1000)]
for i, val in enumerate(data_stream):
kswin.update(val)
if kswin.drift_detected:
print(f"Variance change detected at index {i}")
Pre-Initialized Window
from river import drift
# Start with pre-collected data to avoid cold start
initial_data = [0.5, 0.3, 0.7, 0.4, 0.6] * 20
kswin = drift.KSWIN(alpha=0.005, window_size=100, stat_size=30, window=initial_data)