Principle:SeldonIO Seldon core HuggingFace Model Preparation
| Field | Value |
|---|---|
| Overview | Downloading and serializing HuggingFace Transformer models for deployment on MLServer. |
| Domains | NLP, Model_Serialization |
| Related Implementation | SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained |
| Knowledge Sources | Repo (https://github.com/SeldonIO/seldon-core), Doc (https://huggingface.co/docs/transformers) |
| Last Updated | 2026-02-13 00:00 GMT |
Description
HuggingFace models are prepared for Seldon Core 2 by downloading pre-trained weights and tokenizers using the transformers library, wrapping them in a pipeline object, and saving the artifacts using save_pretrained(). The saved directory contains all files needed by MLServer's HuggingFace runtime.
The preparation workflow follows three steps:
- Download the pre-trained tokenizer and model weights from the HuggingFace Hub using the respective
from_pretrained()class methods. - Wrap the tokenizer and model in a
pipeline()object, specifying the task type (e.g.,text-generation,sentiment-analysis,automatic-speech-recognition). - Serialize the pipeline using
save_pretrained(), which writes all model weights, tokenizer vocabulary files, and configuration into a single directory.
The resulting artifact directory is then uploaded to a remote storage location (e.g., Google Cloud Storage) for consumption by Seldon Core 2's model loading mechanism.
Theoretical Basis
Transfer learning leverages pre-trained language models (GPT-2, BERT, Whisper) by downloading their weights from the HuggingFace Hub. These models have been trained on massive corpora and encode rich linguistic representations that can be applied directly to downstream tasks or fine-tuned for specific domains.
The save_pretrained() method serializes both model weights and tokenizer vocabulary into a portable directory format that can be loaded by any compatible runtime. This format includes:
- config.json -- model architecture configuration
- model weights (e.g.,
tf_model.h5for TensorFlow,pytorch_model.binfor PyTorch) - tokenizer files -- vocabulary, merges, special tokens map
- tokenizer_config.json -- tokenizer settings and task metadata
By saving the complete pipeline (model + tokenizer + task configuration), the serialized artifact is self-contained and can be loaded without needing to know the original model class or task type at inference time.
Usage
This principle applies when preparing HuggingFace Transformer models for serving on Seldon Core 2, including:
- Text generation models (GPT-2, GPT-Neo) for auto-regressive text completion
- Sentiment analysis models (DistilBERT, RoBERTa) for text classification
- Speech-to-text models (Whisper) for automatic speech recognition
- Any other HuggingFace pipeline-compatible task type
Related Pages
- SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained -- implements this principle with concrete code for downloading and serializing GPT-2
- SeldonIO_Seldon_core_HuggingFace_Model_Resource_Definition -- follows from this principle; defines the Model CRD that references the serialized artifacts
- SeldonIO_Seldon_core_Model_Artifact_Preparation -- generalizes this principle for all model types in Seldon Core 2
Implementation:SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained