Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:SeldonIO Seldon core Model Lifecycle Management

From Leeroopedia
Property Value
Principle Name Model_Lifecycle_Management
Overview Operational procedures for unloading, updating, and rolling over ML model versions in production.
Workflow Model_Deployment
Domains MLOps, Kubernetes
Related Implementation SeldonIO_Seldon_core_Seldon_Model_Unload
Last Updated 2026-02-13 00:00 GMT

Description

Model lifecycle management includes unloading unused models (freeing server resources), performing rolling updates by resubmitting Model CRDs with updated storageUri versions, and managing the model-to-server assignment during transitions. This principle covers the operational phase that follows initial deployment, addressing the ongoing need to maintain, update, and decommission models in a production Seldon Core 2 environment.

The key lifecycle operations are:

  • Unloading: Removing a model from an inference server to free memory and compute resources, using seldon model unload or kubectl delete model
  • Rolling update: Submitting an updated Model CRD with a new storageUri (pointing to a newer model version), which triggers the scheduler to gracefully transition from the old version to the new one
  • Capacity management: Leveraging server overcommit to host more models than can fit in memory simultaneously, with the scheduler evicting and reloading models on demand based on usage patterns

Theoretical Basis

Rolling updates in Seldon Core 2 follow a version progression pattern: submitting an updated Model CRD triggers the scheduler to load the new version while gradually draining the old one. This approach ensures zero-downtime model updates by maintaining at least one active version at all times during the transition.

The lifecycle management model is based on several key concepts:

Server Overcommit

Seldon Core 2 supports server overcommit, allowing more models to be assigned to a Server than can fit in memory simultaneously. The scheduler uses an LRU (Least Recently Used) eviction strategy to manage which models are actively loaded. This enables:

  • Cost efficiency: A single Server can serve hundreds of models, with only the frequently accessed ones kept in memory
  • Graceful degradation: Infrequently used models experience a cold-start delay when first accessed but remain available
  • Resource optimization: Memory is dynamically allocated to the models that need it most

Version Transition

When a model is updated (new storageUri), the scheduler orchestrates a controlled transition:

  1. New version load: The updated model artifact is downloaded and loaded on the Server
  2. Traffic shift: Once the new version is ready, inference requests are routed to it
  3. Old version drain: The previous version is unloaded after all in-flight requests complete

This is analogous to Kubernetes Deployment rolling updates but operates at the model level within a shared inference server.

Model Unloading

Unloading a model completely removes it from the scheduler's assignment and frees all associated resources. This is a permanent operation (unlike eviction, which is temporary). Unloading is appropriate when:

  • A model is being decommissioned and will no longer receive requests
  • A model needs to be replaced by a fundamentally different model (not just a version update)
  • Server capacity needs to be reclaimed for higher-priority models

Usage

This principle applies when decommissioning models, updating to new model versions, or managing server capacity in a Seldon Core 2 deployment.

Unloading a Model

# Unload via Seldon CLI
seldon model unload iris

# Unload via kubectl
kubectl delete model iris

Rolling Update to New Version

# Update the storageUri in the Model YAML to point to the new version
# Then resubmit:
seldon model load -f updated-model.yaml

# The scheduler will handle the transition from old to new version
seldon model status iris -w ModelAvailable

Monitoring Server Capacity

# Check which models are loaded on a server
seldon server status mlserver-0

# List all models and their states
kubectl get models -o wide

Related Pages

Implementation:SeldonIO_Seldon_core_Seldon_Model_Unload

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment