Principle:SeldonIO Seldon core Model Resource Definition
| Property | Value |
|---|---|
| Principle Name | Model_Resource_Definition |
| Overview | Declarative specification of ML model resources using Kubernetes Custom Resource Definitions. |
| Workflow | Model_Deployment |
| Domains | MLOps, Kubernetes |
| Related Implementation | SeldonIO_Seldon_core_Seldon_Model_CRD |
| Last Updated | 2026-02-13 00:00 GMT |
Description
Seldon Core 2 uses a Model CRD (apiVersion: mlops.seldon.io/v1alpha1, kind: Model) to declare model artifacts with storage URIs, runtime requirements, and memory allocations. The scheduler then assigns models to matching inference servers. This declarative approach means that operators specify what model they want deployed rather than how to deploy it, and the Seldon Core 2 control plane handles the orchestration.
The Model CRD captures several critical pieces of information:
- metadata.name: A unique identifier for the model within the namespace
- spec.storageUri: The location of the model artifact (GCS, S3, MinIO, or local paths)
- spec.requirements: A list of runtime capability tags (e.g.,
sklearn,tensorflow,huggingface) that must match a Server's capabilities - spec.memory: Optional memory allocation hint for the scheduler (e.g.,
"100Ki")
The scheduler uses the requirements list to find a compatible inference Server. For example, a model with requirements: ["sklearn"] will be assigned to a Server that has the sklearn capability, typically an MLServer instance with the scikit-learn runtime installed.
Theoretical Basis
Kubernetes Custom Resource Definitions (CRDs) extend the Kubernetes API with domain-specific resources. The Model CRD declaratively captures what model to load, from where, and with what runtime constraints. This follows the Kubernetes operator pattern where:
- Desired state is expressed as a CRD manifest (the Model resource)
- Actual state is tracked by the controller (model loaded on a specific Server)
- Reconciliation continuously drives actual state toward desired state
The Model CRD abstracts away infrastructure concerns from ML engineers. Instead of manually configuring inference servers, mounting volumes, and managing processes, users declare their intent through a simple YAML manifest. The Seldon scheduler then handles:
- Server selection: Matching model requirements to Server capabilities
- Artifact retrieval: Downloading model files from remote storage via rclone
- Runtime loading: Invoking the appropriate MLServer or Triton runtime to load the model
- Capacity planning: Respecting memory constraints and server overcommit ratios
This separation of concerns enables platform teams to manage infrastructure (Servers, storage, networking) independently from ML teams who focus on model definitions.
Usage
This principle applies when defining any model for deployment on Seldon Core 2, regardless of framework (sklearn, TensorFlow, HuggingFace, etc.). The typical workflow is:
- Prepare the model artifact and upload it to a storage backend
- Write a Model CRD YAML specifying the storageUri and requirements
- Apply the manifest to the Kubernetes cluster
apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
name: iris
spec:
storageUri: "gs://seldon-models/mlserver/iris"
requirements:
- sklearn
memory: 100Ki
Models can also specify additional fields such as spec.server to pin to a specific Server, or spec.explainer to attach model explanations.
Related Pages
- SeldonIO_Seldon_core_Seldon_Model_CRD implements SeldonIO_Seldon_core_Model_Resource_Definition
- SeldonIO_Seldon_core_Model_Artifact_Preparation precedes SeldonIO_Seldon_core_Model_Resource_Definition
- SeldonIO_Seldon_core_Model_Deployment_Execution follows SeldonIO_Seldon_core_Model_Resource_Definition