Principle:Evidentlyai Evidently Data Drift Detection
| Knowledge Sources | |
|---|---|
| Domains | ML_Monitoring, Statistical_Testing, Data_Quality |
| Last Updated | 2026-02-14 12:00 GMT |
Overview
A statistical testing mechanism that detects distribution changes between reference and current datasets at the column level.
Description
Data Drift Detection identifies when the statistical distribution of a feature column has changed significantly between a reference (baseline) dataset and a current (production) dataset. This is critical for ML monitoring because model performance often degrades when input data distributions shift away from training data.
Evidently supports multiple drift detection methods depending on column type:
- Numerical columns: Kolmogorov-Smirnov test, Wasserstein distance, Population Stability Index (PSI), Jensen-Shannon divergence
- Categorical columns: Chi-squared test, PSI, Jensen-Shannon divergence
- Text columns: Domain classifier, model-based approaches
The method is auto-selected based on column type and dataset size, or can be explicitly specified. A drift score is computed and compared against a threshold to determine if drift is detected.
Usage
Use this principle when monitoring ML pipelines in production to detect data distribution shifts. Apply it to individual columns (ValueDrift) or across all columns simultaneously (DriftedColumnsCount). It requires a reference dataset for comparison.
Theoretical Basis
Column-level drift detection compares the empirical distributions of a feature across two datasets:
Failed to parse (syntax error): {\displaystyle \text{drift\_score} = D(P_{\text{ref}}, P_{\text{cur}}) }
Where is a divergence measure. Common choices:
- KS test: (max CDF difference)
- PSI: (binned distribution comparison)
- Wasserstein: Earth mover's distance between distributions
- Chi-squared: (categorical frequency comparison)
Drift is flagged when drift_score exceeds the method-specific threshold.