Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:SeldonIO Seldon core HuggingFace Model Resource Definition

From Leeroopedia
Field Value
Overview Declaring HuggingFace Transformer models as Seldon Core 2 Model resources with the huggingface runtime requirement.
Domains MLOps, NLP, Kubernetes
Related Implementation SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace
Knowledge Sources Repo (https://github.com/SeldonIO/seldon-core), Doc (https://docs.seldon.io/projects/seldon-core/en/v2/)
Last Updated 2026-02-13 00:00 GMT

Description

HuggingFace models are declared using the standard Model CRD with requirements: ["huggingface"] to route them to an MLServer instance with the HuggingFace runtime. Different model types (sentiment, text-gen, whisper) all use the same requirement tag. Memory allocation may be specified for larger models via the spec.memory field.

The Model CRD provides a uniform interface for declaring HuggingFace models regardless of their specific task type:

  • Sentiment analysis models (e.g., sentiment)
  • Text generation models (e.g., text-gen)
  • Speech-to-text models (e.g., whisper)

Each model declaration specifies:

  1. A name identifying the model in the Seldon Core 2 system
  2. A storageUri pointing to the serialized HuggingFace pipeline artifacts
  3. A requirements list containing huggingface to select the correct runtime

Theoretical Basis

The huggingface requirement tag acts as a capability selector: the Seldon Core 2 scheduler matches models to inference servers that have the HuggingFace MLServer runtime installed. This abstraction decouples model definition from server provisioning.

The scheduling mechanism works as follows:

  1. The Model CRD declares requirements: ["huggingface"].
  2. The Seldon Core 2 scheduler inspects available inference server configurations.
  3. Servers with matching capability tags (i.e., servers running MLServer with the HuggingFace runtime) are selected as deployment targets.
  4. The model artifacts are downloaded from the storageUri and loaded by the HuggingFace runtime on the matched server.

This capability-based routing means that the same model definition pattern applies to all HuggingFace model types. The runtime itself determines how to load and serve the model based on the pipeline metadata stored in the artifact directory.

Usage

This principle applies when defining HuggingFace models for deployment on Seldon Core 2, including:

  • Declaring any HuggingFace Transformer model as a Kubernetes Model resource
  • Specifying memory requirements for large models (e.g., Whisper requires memory: "3Gi")
  • Defining multiple HuggingFace models that will be composed into pipelines

Related Pages

Implementation:SeldonIO_Seldon_core_Seldon_Model_CRD_HuggingFace

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment