Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:SeldonIO Seldon core Drift And Outlier Detection Training

From Leeroopedia
Revision as of 17:16, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/SeldonIO_Seldon_core_Drift_And_Outlier_Detection_Training.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Property Value
Principle Name Drift And Outlier Detection Training
Overview Statistical methods for training drift detectors and outlier detectors to monitor production ML model inputs
Domains MLOps, Statistical_Testing, Anomaly_Detection
Related Implementation SeldonIO_Seldon_core_Alibi_Detect_Training
Knowledge Sources Paper (alibi-detect: https://arxiv.org/abs/2311.01096), Doc (https://docs.seldon.io/projects/alibi-detect)
Last Updated 2026-02-13 00:00 GMT

Description

Production ML systems require monitoring for data drift (distribution shift between training and production data) and outliers (anomalous inputs). The alibi-detect library provides two key detector types for this purpose:

  • TabularDrift for multivariate drift testing using chi-squared and Kolmogorov-Smirnov statistics
  • OutlierVAE for reconstruction-based outlier detection using variational autoencoders

These detectors are trained on reference data from the training distribution and then deployed alongside the production classifier to continuously monitor incoming data quality.

Theoretical Basis

Drift Detection

Drift detection uses statistical hypothesis testing. TabularDrift applies per-feature tests against a reference distribution:

  • Chi-squared tests for categorical features
  • Kolmogorov-Smirnov tests for continuous features
  • Bonferroni correction for multiple testing across features

The null hypothesis is that the reference and test distributions are equal. When the corrected p-value falls below the threshold, drift is declared.

Outlier Detection

OutlierVAE trains a Variational Autoencoder (VAE) to reconstruct normal data. The VAE's latent space compresses input features into a low-dimensional representation. Outliers produce high reconstruction error (MSE) exceeding a learned threshold, making reconstruction error a sensitive anomaly metric.

The VAE learns the manifold of normal data during training. At inference time, inputs that lie far from this manifold cannot be faithfully reconstructed, resulting in elevated reconstruction error.

Mathematical Formulation

TabularDrift

H0: P_ref = P_test
p_val threshold = 0.05

Per-feature test:
  - Categorical: chi-squared test statistic
  - Continuous: KS test statistic D = sup|F_ref(x) - F_test(x)|

Multiple testing correction:
  - Bonferroni: adjusted p_val = p_val * n_features

OutlierVAE

outlier_score = MSE(x, VAE(x))
is_outlier = (outlier_score > threshold)

Where:
  VAE(x) = decoder(z), z ~ q(z|x)
  q(z|x) = encoder output (approximate posterior)
  MSE = (1/d) * sum((x_i - x_hat_i)^2)

Usage

Use this principle when building a production monitoring pipeline that needs to detect distribution shift or anomalous inputs before they degrade model performance. The trained detectors are serialized and deployed as independent model components in the Seldon Core 2 pipeline.

The typical workflow is:

  1. Train the classifier on reference data
  2. Train TabularDrift using the same reference data as the baseline distribution
  3. Train OutlierVAE on preprocessed reference features to learn the normal reconstruction manifold
  4. Save all detectors using alibi-detect's save_detector utility
  5. Deploy detectors alongside the classifier in a monitoring pipeline

Related Pages

Implementation:SeldonIO_Seldon_core_Alibi_Detect_Training

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment