Workflow:SeldonIO Seldon core AB Testing Experiment
| Knowledge Sources | |
|---|---|
| Domains | MLOps, Experimentation, Kubernetes |
| Last Updated | 2026-02-13 14:00 GMT |
Overview
End-to-end process for running A/B tests and traffic mirroring experiments between model or pipeline candidates in Seldon Core 2.
Description
This workflow covers setting up experiments that split inference traffic between multiple model or pipeline candidates for comparison. Seldon Core 2's Experiment resource enables weight-based traffic distribution (A/B testing), shadow/mirror deployments (where traffic is duplicated to a secondary candidate for observation without affecting responses), and progressive rollouts. Experiments operate transparently at the inference layer, routing requests according to configured weights while maintaining sticky session support for consistent user experiences.
Usage
Execute this workflow when you need to compare the performance of multiple model versions in production, validate a new model before full deployment, or run shadow tests to evaluate a candidate model without impacting live traffic. Common triggers include deploying a retrained model, switching ML frameworks, or comparing different model architectures on the same task.
Execution Steps
Step 1: Deploy Candidate Models
Deploy all model or pipeline candidates that will participate in the experiment. Each candidate must be independently functional and able to serve predictions. Verify that all candidates accept the same input format and produce compatible output schemas.
Key considerations:
- Candidates can be individual Models or entire Pipelines
- All candidates must be in the Available/Ready state before starting the experiment
- Candidates should accept the same input schema to ensure fair comparison
- Both the current production model and the challenger model must be deployed
Step 2: Define Experiment Configuration
Create an Experiment custom resource that specifies the candidates, their traffic weights, and optionally a default candidate. For A/B testing, assign percentage weights that sum to 100. For mirror mode, designate one candidate as the primary (receives and responds) and one as the mirror (receives a copy of traffic).
Key considerations:
- Weight values control the percentage of traffic routed to each candidate
- The default candidate receives traffic when no experiment routing header is present
- Mirror mode duplicates requests to the mirror candidate without affecting the response
- Experiment names must be unique within the namespace
Step 3: Start Experiment
Apply the Experiment resource to begin traffic splitting. The scheduler modifies the routing layer to intercept requests to the target model or pipeline name and distribute them according to the experiment configuration. Existing inference endpoints remain unchanged; the experiment operates transparently.
Key considerations:
- The experiment targets an existing model or pipeline name and intercepts its traffic
- Starting an experiment does not disrupt existing connections
- Traffic splitting happens at the scheduler level, not at the application layer
Step 4: Monitor Experiment Metrics
Send inference requests and observe routing behavior. Each response includes an x-seldon-route header indicating which candidate served the request. Collect prediction metrics, latency measurements, and business KPIs for each candidate to evaluate performance differences.
Key considerations:
- The x-seldon-route response header reveals which candidate handled each request
- Use the --show-headers flag with seldon CLI to inspect routing information
- Sticky sessions can be enabled by sending the x-seldon-route header in subsequent requests
- Prometheus metrics are labeled by candidate for comparison
Step 5: Update or Conclude Experiment
Based on collected metrics, either promote the winning candidate, adjust traffic weights for further testing, or stop the experiment. Updating the experiment resource modifies weights in real time. Stopping the experiment restores normal routing to the default candidate.
Key considerations:
- Experiment weights can be updated without stopping the experiment
- Candidates can be added or removed from a running experiment
- Stopping the experiment restores all traffic to the default candidate
- The losing candidate can be unloaded to free resources after the experiment concludes