Principle:SeldonIO Seldon core Monitoring Pipeline Definition
| Property | Value |
|---|---|
| Principle Name | Monitoring Pipeline Definition |
| Overview | Composing classifier, preprocessor, drift detector, and outlier detector into a unified monitoring pipeline with batch processing |
| Domains | MLOps, Data_Flow |
| Related Implementation | SeldonIO_Seldon_core_Seldon_Pipeline_CRD_Monitoring |
| Knowledge Sources | Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/seldon-core/en/v2/) |
| Last Updated | 2026-02-13 00:00 GMT |
Description
A monitoring pipeline chains multiple models with different roles:
- Classifier (income) - Receives raw input directly and produces predictions
- Preprocessor (income-preprocess) - Receives raw input and transforms features for downstream detectors
- Outlier Detector (income-outlier) - Consumes preprocessed features and flags anomalous inputs per-request
- Drift Detector (income-drift) - Receives raw input and aggregates requests in batches before running statistical tests
The pipeline outputs both predictions from the classifier and outlier flags from the outlier detector. The drift detector runs asynchronously in batches and reports results separately.
Theoretical Basis
Monitoring pipelines extend inference pipelines with batch aggregation for statistical tests. The key design considerations are:
Batch Aggregation for Drift Detection
Drift detection requires batches of samples (e.g., 20) because per-sample drift testing lacks statistical power. A single data point cannot meaningfully indicate whether the overall distribution has shifted. The Kolmogorov-Smirnov test and chi-squared test require sufficient sample sizes to achieve reliable p-values.
The batch size represents a trade-off:
- Smaller batches (e.g., 5-10): Faster detection but higher false positive rate
- Larger batches (e.g., 50-100): More reliable detection but delayed signal
- Typical batch (e.g., 20): Reasonable balance for most production scenarios
Dependency Chains
The outlier detector chains after preprocessing to work on the same feature space used during training. Raw input features may include categorical encodings and un-normalized values that the OutlierVAE was not trained on. The preprocessor applies the same transformations used during training (imputation, scaling, one-hot encoding) to ensure the outlier detector sees consistent feature representations.
Multi-Output Design
The pipeline produces multiple outputs from different steps:
- Classifier predictions - The primary inference result
- Outlier flags - Per-request binary outlier indicator (is_outlier field)
This allows downstream consumers to receive both the prediction and its trustworthiness assessment in a single response.
Usage
Use this principle when defining a pipeline that combines inference with real-time drift and outlier monitoring. The monitoring pipeline definition requires:
- All four component models are deployed and available
- The dependency graph is defined (outlier depends on preprocessor output)
- Batch sizes are configured for drift detection
- Output steps specify which results to return to the caller
Related Pages
- SeldonIO_Seldon_core_Seldon_Pipeline_CRD_Monitoring (implements this principle) - Concrete pattern for declaring monitoring pipelines
- SeldonIO_Seldon_core_Monitoring_Component_Deployment (prerequisite) - Deploying the four component models
- SeldonIO_Seldon_core_Seldon_Model_Load_For_Monitoring (prerequisite) - Loading component models via CLI
- SeldonIO_Seldon_core_Monitoring_Pipeline_Validation (next step) - Validating the deployed monitoring pipeline
- SeldonIO_Seldon_core_Production_Traffic_Monitoring (uses pipeline) - Sending live traffic through the monitoring pipeline
Implementation:SeldonIO_Seldon_core_Seldon_Pipeline_CRD_Monitoring