Principle:SeldonIO Seldon core HuggingFace Model Preparation

Field	Value
Overview	Downloading and serializing HuggingFace Transformer models for deployment on MLServer.
Domains	NLP, Model_Serialization
Related Implementation	SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained
Knowledge Sources	Repo (https://github.com/SeldonIO/seldon-core), Doc (https://huggingface.co/docs/transformers)
Last Updated	2026-02-13 00:00 GMT

Description

HuggingFace models are prepared for Seldon Core 2 by downloading pre-trained weights and tokenizers using the transformers library, wrapping them in a pipeline object, and saving the artifacts using save_pretrained(). The saved directory contains all files needed by MLServer's HuggingFace runtime.

The preparation workflow follows three steps:

Download the pre-trained tokenizer and model weights from the HuggingFace Hub using the respective from_pretrained() class methods.
Wrap the tokenizer and model in a pipeline() object, specifying the task type (e.g., text-generation, sentiment-analysis, automatic-speech-recognition).
Serialize the pipeline using save_pretrained(), which writes all model weights, tokenizer vocabulary files, and configuration into a single directory.

The resulting artifact directory is then uploaded to a remote storage location (e.g., Google Cloud Storage) for consumption by Seldon Core 2's model loading mechanism.

Theoretical Basis

Transfer learning leverages pre-trained language models (GPT-2, BERT, Whisper) by downloading their weights from the HuggingFace Hub. These models have been trained on massive corpora and encode rich linguistic representations that can be applied directly to downstream tasks or fine-tuned for specific domains.

The save_pretrained() method serializes both model weights and tokenizer vocabulary into a portable directory format that can be loaded by any compatible runtime. This format includes:

config.json -- model architecture configuration
model weights (e.g., tf_model.h5 for TensorFlow, pytorch_model.bin for PyTorch)
tokenizer files -- vocabulary, merges, special tokens map
tokenizer_config.json -- tokenizer settings and task metadata

By saving the complete pipeline (model + tokenizer + task configuration), the serialized artifact is self-contained and can be loaded without needing to know the original model class or task type at inference time.

Usage

This principle applies when preparing HuggingFace Transformer models for serving on Seldon Core 2, including:

Text generation models (GPT-2, GPT-Neo) for auto-regressive text completion
Sentiment analysis models (DistilBERT, RoBERTa) for text classification
Speech-to-text models (Whisper) for automatic speech recognition
Any other HuggingFace pipeline-compatible task type

Related Pages

SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained -- implements this principle with concrete code for downloading and serializing GPT-2
SeldonIO_Seldon_core_HuggingFace_Model_Resource_Definition -- follows from this principle; defines the Model CRD that references the serialized artifacts
SeldonIO_Seldon_core_Model_Artifact_Preparation -- generalizes this principle for all model types in Seldon Core 2

Implementation:SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment