Principle:SeldonIO Seldon core Drift And Outlier Detection Training

Property	Value
Principle Name	Drift And Outlier Detection Training
Overview	Statistical methods for training drift detectors and outlier detectors to monitor production ML model inputs
Domains	MLOps, Statistical_Testing, Anomaly_Detection
Related Implementation	SeldonIO_Seldon_core_Alibi_Detect_Training
Knowledge Sources	Paper (alibi-detect: https://arxiv.org/abs/2311.01096), Doc (https://docs.seldon.io/projects/alibi-detect)
Last Updated	2026-02-13 00:00 GMT

Description

Production ML systems require monitoring for data drift (distribution shift between training and production data) and outliers (anomalous inputs). The alibi-detect library provides two key detector types for this purpose:

TabularDrift for multivariate drift testing using chi-squared and Kolmogorov-Smirnov statistics
OutlierVAE for reconstruction-based outlier detection using variational autoencoders

These detectors are trained on reference data from the training distribution and then deployed alongside the production classifier to continuously monitor incoming data quality.

Theoretical Basis

Drift Detection

Drift detection uses statistical hypothesis testing. TabularDrift applies per-feature tests against a reference distribution:

Chi-squared tests for categorical features
Kolmogorov-Smirnov tests for continuous features
Bonferroni correction for multiple testing across features

The null hypothesis is that the reference and test distributions are equal. When the corrected p-value falls below the threshold, drift is declared.

Outlier Detection

OutlierVAE trains a Variational Autoencoder (VAE) to reconstruct normal data. The VAE's latent space compresses input features into a low-dimensional representation. Outliers produce high reconstruction error (MSE) exceeding a learned threshold, making reconstruction error a sensitive anomaly metric.

The VAE learns the manifold of normal data during training. At inference time, inputs that lie far from this manifold cannot be faithfully reconstructed, resulting in elevated reconstruction error.

Mathematical Formulation

TabularDrift

H0: P_ref = P_test
p_val threshold = 0.05

Per-feature test:
  - Categorical: chi-squared test statistic
  - Continuous: KS test statistic D = sup|F_ref(x) - F_test(x)|

Multiple testing correction:
  - Bonferroni: adjusted p_val = p_val * n_features

OutlierVAE

outlier_score = MSE(x, VAE(x))
is_outlier = (outlier_score > threshold)

Where:
  VAE(x) = decoder(z), z ~ q(z|x)
  q(z|x) = encoder output (approximate posterior)
  MSE = (1/d) * sum((x_i - x_hat_i)^2)

Usage

Use this principle when building a production monitoring pipeline that needs to detect distribution shift or anomalous inputs before they degrade model performance. The trained detectors are serialized and deployed as independent model components in the Seldon Core 2 pipeline.

The typical workflow is:

Train the classifier on reference data
Train TabularDrift using the same reference data as the baseline distribution
Train OutlierVAE on preprocessed reference features to learn the normal reconstruction manifold
Save all detectors using alibi-detect's save_detector utility
Deploy detectors alongside the classifier in a monitoring pipeline

Related Pages

SeldonIO_Seldon_core_Alibi_Detect_Training (implements this principle) - Concrete tools for training drift and outlier detectors using alibi-detect
SeldonIO_Seldon_core_Monitoring_Component_Deployment (next step) - Deploying trained detectors as model components
SeldonIO_Seldon_core_Monitoring_Pipeline_Definition (uses detectors) - Composing detectors into a unified monitoring pipeline
SeldonIO_Seldon_core_Production_Traffic_Monitoring (end goal) - Sending production traffic through the monitoring pipeline

Implementation:SeldonIO_Seldon_core_Alibi_Detect_Training

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment