Principle:SeldonIO Seldon core Experiment Lifecycle Management
| Field | Value |
|---|---|
| Overview | Operational procedures for updating experiment weights, adding candidates, and concluding experiments. |
| Domains | MLOps, Experimentation |
| Related Implementation | SeldonIO_Seldon_core_Seldon_Experiment_Stop |
| Last Updated | 2026-02-13 00:00 GMT |
Description
Experiments can be updated in-place by resubmitting an Experiment CRD with modified weights or candidates. Stopping an experiment reverts traffic routing to the default model. Experiment version updates allow gradual rollout by shifting weights progressively from the current production model to the new candidate.
The experiment lifecycle consists of the following phases:
- Creation: Define the Experiment CRD with initial candidates and weights.
- Activation: Start the experiment to engage traffic routing.
- Monitoring: Observe traffic distribution and candidate performance.
- Update: Modify weights, add/remove candidates, or change mirror configuration by resubmitting the CRD.
- Conclusion: Stop the experiment to revert routing, then optionally promote the winning candidate.
In-Place Updates
Experiments support in-place updates through the same mechanism used to start them. Resubmitting an Experiment CRD with the same metadata.name but different parameters causes the scheduler to update the routing table without interrupting traffic. This enables:
- Weight shifting: Gradually increasing a candidate's weight (e.g., 10% to 25% to 50% to 100%) for canary-style rollouts
- Candidate addition: Adding new candidates to a running experiment
- Candidate removal: Removing underperforming candidates from the experiment
- Mode changes: Switching from A/B testing to mirroring or vice versa
Experiment Conclusion
Stopping an experiment reverts all traffic routing to the default model. After stopping:
- The default model endpoint returns to normal (non-experiment) routing
- All candidates remain deployed but no longer receive experiment-routed traffic
- Candidates can be individually unloaded if no longer needed
Theoretical Basis
Experiment lifecycle follows the same declarative pattern as other Seldon resources: update the CRD to change behavior, delete/stop to revert. This aligns with Kubernetes declarative management principles where the desired state is expressed as a resource specification and the system converges to that state.
Progressive weight shifting enables canary-style rollouts where traffic is gradually moved to the winning candidate. This approach:
- Reduces risk: Small initial traffic percentages limit exposure to potential issues
- Enables early detection: Problems with the new candidate are caught before full rollout
- Supports rollback: Reverting to the previous weight distribution is a simple CRD update
- Provides continuous validation: Each weight increase can be validated with traffic analysis before proceeding
The lifecycle model treats experiments as mutable resources that can transition through multiple configurations before being concluded. This differs from immutable experiment designs where each configuration change creates a new experiment.
Usage
This principle applies when concluding an A/B test, updating experiment parameters, or rolling out a winning candidate. Key scenarios:
Progressive Canary Rollout
- Start experiment with 90/10 split (production/candidate)
- Monitor for errors and performance regressions
- If candidate performs well, update to 70/30
- Continue monitoring; update to 50/50
- If still healthy, update to 10/90
- Finally, stop the experiment and promote the candidate as the new default
Immediate Rollback
- Detect issues with a candidate during an active experiment
- Stop the experiment immediately to revert all traffic to the default model
- Investigate the issue with the candidate model
- Optionally restart the experiment after fixing the issue
Experiment Conclusion
- Analyze traffic distribution and candidate performance
- Determine the winning candidate based on metrics
- Stop the experiment
- Promote the winning candidate by making it the new default model
- Unload losing candidates to free resources
Related Pages
- SeldonIO_Seldon_core_Seldon_Experiment_Stop — implements this principle — Concrete CLI tool for stopping experiments and managing experiment lifecycle in Seldon Core 2.
- SeldonIO_Seldon_core_Experiment_Traffic_Analysis — prerequisite principle — Monitoring which candidate model serves each request during an experiment.
- SeldonIO_Seldon_core_Experiment_Execution — related principle — Activating an experiment to begin traffic splitting or mirroring between model candidates.
- SeldonIO_Seldon_core_Experiment_Configuration — related principle — Declarative specification of traffic routing rules for A/B tests and traffic mirroring experiments.