Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:SeldonIO Seldon core HuggingFace Model Preparation

From Leeroopedia
Field Value
Overview Downloading and serializing HuggingFace Transformer models for deployment on MLServer.
Domains NLP, Model_Serialization
Related Implementation SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained
Knowledge Sources Repo (https://github.com/SeldonIO/seldon-core), Doc (https://huggingface.co/docs/transformers)
Last Updated 2026-02-13 00:00 GMT

Description

HuggingFace models are prepared for Seldon Core 2 by downloading pre-trained weights and tokenizers using the transformers library, wrapping them in a pipeline object, and saving the artifacts using save_pretrained(). The saved directory contains all files needed by MLServer's HuggingFace runtime.

The preparation workflow follows three steps:

  1. Download the pre-trained tokenizer and model weights from the HuggingFace Hub using the respective from_pretrained() class methods.
  2. Wrap the tokenizer and model in a pipeline() object, specifying the task type (e.g., text-generation, sentiment-analysis, automatic-speech-recognition).
  3. Serialize the pipeline using save_pretrained(), which writes all model weights, tokenizer vocabulary files, and configuration into a single directory.

The resulting artifact directory is then uploaded to a remote storage location (e.g., Google Cloud Storage) for consumption by Seldon Core 2's model loading mechanism.

Theoretical Basis

Transfer learning leverages pre-trained language models (GPT-2, BERT, Whisper) by downloading their weights from the HuggingFace Hub. These models have been trained on massive corpora and encode rich linguistic representations that can be applied directly to downstream tasks or fine-tuned for specific domains.

The save_pretrained() method serializes both model weights and tokenizer vocabulary into a portable directory format that can be loaded by any compatible runtime. This format includes:

  • config.json -- model architecture configuration
  • model weights (e.g., tf_model.h5 for TensorFlow, pytorch_model.bin for PyTorch)
  • tokenizer files -- vocabulary, merges, special tokens map
  • tokenizer_config.json -- tokenizer settings and task metadata

By saving the complete pipeline (model + tokenizer + task configuration), the serialized artifact is self-contained and can be loaded without needing to know the original model class or task type at inference time.

Usage

This principle applies when preparing HuggingFace Transformer models for serving on Seldon Core 2, including:

  • Text generation models (GPT-2, GPT-Neo) for auto-regressive text completion
  • Sentiment analysis models (DistilBERT, RoBERTa) for text classification
  • Speech-to-text models (Whisper) for automatic speech recognition
  • Any other HuggingFace pipeline-compatible task type

Related Pages

Implementation:SeldonIO_Seldon_core_Transformers_Pipeline_Save_Pretrained

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment