Principle:SeldonIO Seldon core Model Lifecycle Management
| Property | Value |
|---|---|
| Principle Name | Model_Lifecycle_Management |
| Overview | Operational procedures for unloading, updating, and rolling over ML model versions in production. |
| Workflow | Model_Deployment |
| Domains | MLOps, Kubernetes |
| Related Implementation | SeldonIO_Seldon_core_Seldon_Model_Unload |
| Last Updated | 2026-02-13 00:00 GMT |
Description
Model lifecycle management includes unloading unused models (freeing server resources), performing rolling updates by resubmitting Model CRDs with updated storageUri versions, and managing the model-to-server assignment during transitions. This principle covers the operational phase that follows initial deployment, addressing the ongoing need to maintain, update, and decommission models in a production Seldon Core 2 environment.
The key lifecycle operations are:
- Unloading: Removing a model from an inference server to free memory and compute resources, using
seldon model unloadorkubectl delete model - Rolling update: Submitting an updated Model CRD with a new
storageUri(pointing to a newer model version), which triggers the scheduler to gracefully transition from the old version to the new one - Capacity management: Leveraging server overcommit to host more models than can fit in memory simultaneously, with the scheduler evicting and reloading models on demand based on usage patterns
Theoretical Basis
Rolling updates in Seldon Core 2 follow a version progression pattern: submitting an updated Model CRD triggers the scheduler to load the new version while gradually draining the old one. This approach ensures zero-downtime model updates by maintaining at least one active version at all times during the transition.
The lifecycle management model is based on several key concepts:
Server Overcommit
Seldon Core 2 supports server overcommit, allowing more models to be assigned to a Server than can fit in memory simultaneously. The scheduler uses an LRU (Least Recently Used) eviction strategy to manage which models are actively loaded. This enables:
- Cost efficiency: A single Server can serve hundreds of models, with only the frequently accessed ones kept in memory
- Graceful degradation: Infrequently used models experience a cold-start delay when first accessed but remain available
- Resource optimization: Memory is dynamically allocated to the models that need it most
Version Transition
When a model is updated (new storageUri), the scheduler orchestrates a controlled transition:
- New version load: The updated model artifact is downloaded and loaded on the Server
- Traffic shift: Once the new version is ready, inference requests are routed to it
- Old version drain: The previous version is unloaded after all in-flight requests complete
This is analogous to Kubernetes Deployment rolling updates but operates at the model level within a shared inference server.
Model Unloading
Unloading a model completely removes it from the scheduler's assignment and frees all associated resources. This is a permanent operation (unlike eviction, which is temporary). Unloading is appropriate when:
- A model is being decommissioned and will no longer receive requests
- A model needs to be replaced by a fundamentally different model (not just a version update)
- Server capacity needs to be reclaimed for higher-priority models
Usage
This principle applies when decommissioning models, updating to new model versions, or managing server capacity in a Seldon Core 2 deployment.
Unloading a Model
# Unload via Seldon CLI
seldon model unload iris
# Unload via kubectl
kubectl delete model iris
Rolling Update to New Version
# Update the storageUri in the Model YAML to point to the new version
# Then resubmit:
seldon model load -f updated-model.yaml
# The scheduler will handle the transition from old to new version
seldon model status iris -w ModelAvailable
Monitoring Server Capacity
# Check which models are loaded on a server
seldon server status mlserver-0
# List all models and their states
kubectl get models -o wide
Related Pages
- SeldonIO_Seldon_core_Seldon_Model_Unload implements SeldonIO_Seldon_core_Model_Lifecycle_Management
- SeldonIO_Seldon_core_Model_Deployment_Execution precedes SeldonIO_Seldon_core_Model_Lifecycle_Management
- SeldonIO_Seldon_core_Model_Readiness_Verification is used during SeldonIO_Seldon_core_Model_Lifecycle_Management
- SeldonIO_Seldon_core_V2_Inference_Protocol is affected by SeldonIO_Seldon_core_Model_Lifecycle_Management
- Heuristic:SeldonIO_Seldon_core_Over_Commit_Memory_Tip
- Heuristic:SeldonIO_Seldon_core_Autoscaling_Dual_Config_Tip