Principle:SeldonIO Seldon core HuggingFace Model Resource Definition
| Field | Value |
|---|---|
| Overview | Declaring HuggingFace Transformer models as Seldon Core 2 Model resources with the huggingface runtime requirement. |
| Domains | MLOps, NLP, Kubernetes |
| Related Implementation | SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace |
| Knowledge Sources | Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/seldon-core/en/v2/) |
| Last Updated | 2026-02-13 00:00 GMT |
Description
HuggingFace models are declared using the standard Model CRD with requirements: ["huggingface"] to route them to an MLServer instance with the HuggingFace runtime. Different model types (sentiment, text-gen, whisper) all use the same requirement tag. Memory allocation may be specified for larger models via the spec.memory field.
The Model CRD provides a uniform interface for declaring HuggingFace models regardless of their specific task type:
- Sentiment analysis models (e.g.,
sentiment) - Text generation models (e.g.,
text-gen) - Speech-to-text models (e.g.,
whisper)
Each model declaration specifies:
- A name identifying the model in the Seldon Core 2 system
- A storageUri pointing to the serialized HuggingFace pipeline artifacts
- A requirements list containing
huggingfaceto select the correct runtime
Theoretical Basis
The huggingface requirement tag acts as a capability selector: the Seldon Core 2 scheduler matches models to inference servers that have the HuggingFace MLServer runtime installed. This abstraction decouples model definition from server provisioning.
The scheduling mechanism works as follows:
- The Model CRD declares
requirements: ["huggingface"]. - The Seldon Core 2 scheduler inspects available inference server configurations.
- Servers with matching capability tags (i.e., servers running MLServer with the HuggingFace runtime) are selected as deployment targets.
- The model artifacts are downloaded from the
storageUriand loaded by the HuggingFace runtime on the matched server.
This capability-based routing means that the same model definition pattern applies to all HuggingFace model types. The runtime itself determines how to load and serve the model based on the pipeline metadata stored in the artifact directory.
Usage
This principle applies when defining HuggingFace models for deployment on Seldon Core 2, including:
- Declaring any HuggingFace Transformer model as a Kubernetes Model resource
- Specifying memory requirements for large models (e.g., Whisper requires
memory: "3Gi") - Defining multiple HuggingFace models that will be composed into pipelines
Related Pages
- SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace -- implements this principle with concrete YAML manifests
- SeldonIO_Seldon_core_HuggingFace_Model_Preparation -- precedes this principle; artifacts must be prepared before defining the Model resource
- SeldonIO_Seldon_core_HuggingFace_Model_Deployment_And_Verification -- follows this principle; after defining the Model CRD, it must be deployed and verified
- SeldonIO_Seldon_core_Model_Resource_Definition -- generalizes this principle for all model types in Seldon Core 2
Implementation:SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace