Principle:SeldonIO Seldon core Model Resource Definition

Property	Value
Principle Name	Model_Resource_Definition
Overview	Declarative specification of ML model resources using Kubernetes Custom Resource Definitions.
Workflow	Model_Deployment
Domains	MLOps, Kubernetes
Related Implementation	SeldonIO_Seldon_core_Seldon_Model_CRD
Last Updated	2026-02-13 00:00 GMT

Description

Seldon Core 2 uses a Model CRD (apiVersion: mlops.seldon.io/v1alpha1, kind: Model) to declare model artifacts with storage URIs, runtime requirements, and memory allocations. The scheduler then assigns models to matching inference servers. This declarative approach means that operators specify what model they want deployed rather than how to deploy it, and the Seldon Core 2 control plane handles the orchestration.

The Model CRD captures several critical pieces of information:

metadata.name: A unique identifier for the model within the namespace
spec.storageUri: The location of the model artifact (GCS, S3, MinIO, or local paths)
spec.requirements: A list of runtime capability tags (e.g., sklearn, tensorflow, huggingface) that must match a Server's capabilities
spec.memory: Optional memory allocation hint for the scheduler (e.g., "100Ki")

The scheduler uses the requirements list to find a compatible inference Server. For example, a model with requirements: ["sklearn"] will be assigned to a Server that has the sklearn capability, typically an MLServer instance with the scikit-learn runtime installed.

Theoretical Basis

Kubernetes Custom Resource Definitions (CRDs) extend the Kubernetes API with domain-specific resources. The Model CRD declaratively captures what model to load, from where, and with what runtime constraints. This follows the Kubernetes operator pattern where:

Desired state is expressed as a CRD manifest (the Model resource)
Actual state is tracked by the controller (model loaded on a specific Server)
Reconciliation continuously drives actual state toward desired state

The Model CRD abstracts away infrastructure concerns from ML engineers. Instead of manually configuring inference servers, mounting volumes, and managing processes, users declare their intent through a simple YAML manifest. The Seldon scheduler then handles:

Server selection: Matching model requirements to Server capabilities
Artifact retrieval: Downloading model files from remote storage via rclone
Runtime loading: Invoking the appropriate MLServer or Triton runtime to load the model
Capacity planning: Respecting memory constraints and server overcommit ratios

This separation of concerns enables platform teams to manage infrastructure (Servers, storage, networking) independently from ML teams who focus on model definitions.

Usage

This principle applies when defining any model for deployment on Seldon Core 2, regardless of framework (sklearn, TensorFlow, HuggingFace, etc.). The typical workflow is:

Prepare the model artifact and upload it to a storage backend
Write a Model CRD YAML specifying the storageUri and requirements
Apply the manifest to the Kubernetes cluster

apiVersion: mlops.seldon.io/v1alpha1
kind: Model
metadata:
  name: iris
spec:
  storageUri: "gs://seldon-models/mlserver/iris"
  requirements:
  - sklearn
  memory: 100Ki

Models can also specify additional fields such as spec.server to pin to a specific Server, or spec.explainer to attach model explanations.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Description

Theoretical Basis

Usage

Related Pages

Page Connections