Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:SeldonIO Seldon core Monitoring Pipeline Definition

From Leeroopedia
Revision as of 18:17, 16 February 2026 by Admin (talk | contribs) (Auto-imported from principles/SeldonIO_Seldon_core_Monitoring_Pipeline_Definition.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Property Value
Principle Name Monitoring Pipeline Definition
Overview Composing classifier, preprocessor, drift detector, and outlier detector into a unified monitoring pipeline with batch processing
Domains MLOps, Data_Flow
Related Implementation SeldonIO_Seldon_core_Seldon_Pipeline_CRD_Monitoring
Knowledge Sources Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/seldon-core/en/v2/)
Last Updated 2026-02-13 00:00 GMT

Description

A monitoring pipeline chains multiple models with different roles:

  • Classifier (income) - Receives raw input directly and produces predictions
  • Preprocessor (income-preprocess) - Receives raw input and transforms features for downstream detectors
  • Outlier Detector (income-outlier) - Consumes preprocessed features and flags anomalous inputs per-request
  • Drift Detector (income-drift) - Receives raw input and aggregates requests in batches before running statistical tests

The pipeline outputs both predictions from the classifier and outlier flags from the outlier detector. The drift detector runs asynchronously in batches and reports results separately.

Theoretical Basis

Monitoring pipelines extend inference pipelines with batch aggregation for statistical tests. The key design considerations are:

Batch Aggregation for Drift Detection

Drift detection requires batches of samples (e.g., 20) because per-sample drift testing lacks statistical power. A single data point cannot meaningfully indicate whether the overall distribution has shifted. The Kolmogorov-Smirnov test and chi-squared test require sufficient sample sizes to achieve reliable p-values.

The batch size represents a trade-off:

  • Smaller batches (e.g., 5-10): Faster detection but higher false positive rate
  • Larger batches (e.g., 50-100): More reliable detection but delayed signal
  • Typical batch (e.g., 20): Reasonable balance for most production scenarios

Dependency Chains

The outlier detector chains after preprocessing to work on the same feature space used during training. Raw input features may include categorical encodings and un-normalized values that the OutlierVAE was not trained on. The preprocessor applies the same transformations used during training (imputation, scaling, one-hot encoding) to ensure the outlier detector sees consistent feature representations.

Multi-Output Design

The pipeline produces multiple outputs from different steps:

  • Classifier predictions - The primary inference result
  • Outlier flags - Per-request binary outlier indicator (is_outlier field)

This allows downstream consumers to receive both the prediction and its trustworthiness assessment in a single response.

Usage

Use this principle when defining a pipeline that combines inference with real-time drift and outlier monitoring. The monitoring pipeline definition requires:

  1. All four component models are deployed and available
  2. The dependency graph is defined (outlier depends on preprocessor output)
  3. Batch sizes are configured for drift detection
  4. Output steps specify which results to return to the caller

Related Pages

Implementation:SeldonIO_Seldon_core_Seldon_Pipeline_CRD_Monitoring

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment