Principle:SeldonIO Seldon core HuggingFace Model Resource Definition

Field	Value
Overview	Declaring HuggingFace Transformer models as Seldon Core 2 Model resources with the huggingface runtime requirement.
Domains	MLOps, NLP, Kubernetes
Related Implementation	SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace
Knowledge Sources	Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/seldon-core/en/v2/)
Last Updated	2026-02-13 00:00 GMT

Description

HuggingFace models are declared using the standard Model CRD with requirements: ["huggingface"] to route them to an MLServer instance with the HuggingFace runtime. Different model types (sentiment, text-gen, whisper) all use the same requirement tag. Memory allocation may be specified for larger models via the spec.memory field.

The Model CRD provides a uniform interface for declaring HuggingFace models regardless of their specific task type:

Sentiment analysis models (e.g., sentiment)
Text generation models (e.g., text-gen)
Speech-to-text models (e.g., whisper)

Each model declaration specifies:

A name identifying the model in the Seldon Core 2 system
A storageUri pointing to the serialized HuggingFace pipeline artifacts
A requirements list containing huggingface to select the correct runtime

Theoretical Basis

The huggingface requirement tag acts as a capability selector: the Seldon Core 2 scheduler matches models to inference servers that have the HuggingFace MLServer runtime installed. This abstraction decouples model definition from server provisioning.

The scheduling mechanism works as follows:

The Model CRD declares requirements: ["huggingface"].
The Seldon Core 2 scheduler inspects available inference server configurations.
Servers with matching capability tags (i.e., servers running MLServer with the HuggingFace runtime) are selected as deployment targets.
The model artifacts are downloaded from the storageUri and loaded by the HuggingFace runtime on the matched server.

This capability-based routing means that the same model definition pattern applies to all HuggingFace model types. The runtime itself determines how to load and serve the model based on the pipeline metadata stored in the artifact directory.

Usage

This principle applies when defining HuggingFace models for deployment on Seldon Core 2, including:

Declaring any HuggingFace Transformer model as a Kubernetes Model resource
Specifying memory requirements for large models (e.g., Whisper requires memory: "3Gi")
Defining multiple HuggingFace models that will be composed into pipelines

Related Pages

SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace -- implements this principle with concrete YAML manifests
SeldonIO_Seldon_core_HuggingFace_Model_Preparation -- precedes this principle; artifacts must be prepared before defining the Model resource
SeldonIO_Seldon_core_HuggingFace_Model_Deployment_And_Verification -- follows this principle; after defining the Model CRD, it must be deployed and verified
SeldonIO_Seldon_core_Model_Resource_Definition -- generalizes this principle for all model types in Seldon Core 2

Implementation:SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment