Workflow:SeldonIO Seldon core AB Testing Experiment

Knowledge Sources	Seldon Core 2 Seldon Core 2 Docs Experiments Guide
Domains	MLOps, Experimentation, Kubernetes
Last Updated	2026-02-13 14:00 GMT

Overview

End-to-end process for running A/B tests and traffic mirroring experiments between model or pipeline candidates in Seldon Core 2.

Description

This workflow covers setting up experiments that split inference traffic between multiple model or pipeline candidates for comparison. Seldon Core 2's Experiment resource enables weight-based traffic distribution (A/B testing), shadow/mirror deployments (where traffic is duplicated to a secondary candidate for observation without affecting responses), and progressive rollouts. Experiments operate transparently at the inference layer, routing requests according to configured weights while maintaining sticky session support for consistent user experiences.

Usage

Execute this workflow when you need to compare the performance of multiple model versions in production, validate a new model before full deployment, or run shadow tests to evaluate a candidate model without impacting live traffic. Common triggers include deploying a retrained model, switching ML frameworks, or comparing different model architectures on the same task.

Execution Steps

Step 1: Deploy Candidate Models

Deploy all model or pipeline candidates that will participate in the experiment. Each candidate must be independently functional and able to serve predictions. Verify that all candidates accept the same input format and produce compatible output schemas.

Key considerations:

Candidates can be individual Models or entire Pipelines
All candidates must be in the Available/Ready state before starting the experiment
Candidates should accept the same input schema to ensure fair comparison
Both the current production model and the challenger model must be deployed

Step 2: Define Experiment Configuration

Create an Experiment custom resource that specifies the candidates, their traffic weights, and optionally a default candidate. For A/B testing, assign percentage weights that sum to 100. For mirror mode, designate one candidate as the primary (receives and responds) and one as the mirror (receives a copy of traffic).

Key considerations:

Weight values control the percentage of traffic routed to each candidate
The default candidate receives traffic when no experiment routing header is present
Mirror mode duplicates requests to the mirror candidate without affecting the response
Experiment names must be unique within the namespace

Step 3: Start Experiment

Apply the Experiment resource to begin traffic splitting. The scheduler modifies the routing layer to intercept requests to the target model or pipeline name and distribute them according to the experiment configuration. Existing inference endpoints remain unchanged; the experiment operates transparently.

Key considerations:

The experiment targets an existing model or pipeline name and intercepts its traffic
Starting an experiment does not disrupt existing connections
Traffic splitting happens at the scheduler level, not at the application layer

Step 4: Monitor Experiment Metrics

Send inference requests and observe routing behavior. Each response includes an x-seldon-route header indicating which candidate served the request. Collect prediction metrics, latency measurements, and business KPIs for each candidate to evaluate performance differences.

Key considerations:

The x-seldon-route response header reveals which candidate handled each request
Use the --show-headers flag with seldon CLI to inspect routing information
Sticky sessions can be enabled by sending the x-seldon-route header in subsequent requests
Prometheus metrics are labeled by candidate for comparison

Step 5: Update or Conclude Experiment

Based on collected metrics, either promote the winning candidate, adjust traffic weights for further testing, or stop the experiment. Updating the experiment resource modifies weights in real time. Stopping the experiment restores normal routing to the default candidate.

Key considerations:

Experiment weights can be updated without stopping the experiment
Candidates can be added or removed from a running experiment
Stopping the experiment restores all traffic to the default candidate
The losing candidate can be unloaded to free resources after the experiment concludes

Execution Diagram

GitHub URL

Workflow Repository