Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Pytorch Serve Transformers Model Dowloader

From Leeroopedia
Field Value
Page Type Implementation
Title Transformers Model Dowloader
Type API Doc
Short Description Script that downloads pretrained HuggingFace Transformer models and tokenizers, optionally traces them to TorchScript, and saves all artifacts to a local directory for model archiving
Domains NLP, Model_Serving
Source examples/Huggingface_Transformers/Download_Transformer_models.py:L22-146
Knowledge Sources TorchServe
Workflow HuggingFace_Transformer_Serving
Last Updated 2026-02-13 00:00 GMT

Overview

The Download_Transformer_models.py script provides the transformers_model_dowloader function, which automates the process of downloading a pretrained HuggingFace Transformer model and its associated tokenizer, then saving them in one of two formats (pretrained or TorchScript) to a local directory. The script reads its configuration from a YAML file and supports four NLP task modes along with specialized hardware tracing for AWS Inferentia.

Description

The function handles the complete model preparation pipeline: loading the appropriate model class based on the NLP task mode, downloading (or loading from cache) the pretrained weights and tokenizer, and serializing the artifacts to the ./Transformer_model/ directory. It supports two serialization strategies and three hardware tracing targets.

Usage

The script is invoked from the command line with an optional YAML configuration file argument:

# Using default model-config.yaml
python Download_Transformer_models.py

# Using a custom configuration file
python Download_Transformer_models.py my-custom-config.yaml

Code Reference

Source Location

Field Value
Repository pytorch/serve
File examples/Huggingface_Transformers/Download_Transformer_models.py
Lines L22-146

Signature

def transformers_model_dowloader(
    mode,
    pretrained_model_name,
    num_labels,
    do_lower_case,
    max_length,
    torchscript,
    hardware,
    batch_size,
):
    """This function, save the checkpoint, config file along with tokenizer config and vocab files
    of a transformer model of your choice.
    """

Parameters

Parameter Type Description
mode str NLP task mode: "sequence_classification", "question_answering", "token_classification", or "text_generation"
pretrained_model_name str HuggingFace model identifier (e.g., "bert-base-uncased")
num_labels int Number of output labels for classification tasks
do_lower_case bool Whether the tokenizer should lowercase input text
max_length int Maximum token sequence length for TorchScript tracing dummy input
torchscript bool Whether to enable TorchScript mode in the model configuration
hardware str or None Hardware target for tracing: "neuron", "neuronx", or None for standard PyTorch
batch_size int Batch size for hardware-specific (Neuron/NeuronX) tracing

Return Value

Returns None. The function produces side effects by writing model and tokenizer files to the ./Transformer_model/ directory.

Import

import os
import sys
import torch
import transformers
import yaml
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoModelForQuestionAnswering,
    AutoModelForSequenceClassification,
    AutoModelForTokenClassification,
    AutoTokenizer,
    set_seed,
)

I/O Contract

Input

Input Type Description
mode str One of four NLP task mode strings
pretrained_model_name str A valid HuggingFace model identifier or local path
num_labels int Positive integer specifying classification head size
do_lower_case bool Tokenizer casing flag
max_length int Sequence length for padding during TorchScript tracing
torchscript bool Flag derived from save_mode in YAML
hardware str or None Optional hardware target string
batch_size int Batch size (defaults to 1 if not specified in YAML)

Output

Output Format Description
Pretrained mode Directory ./Transformer_model/ containing model weights (pytorch_model.bin or model.safetensors), config.json, tokenizer files (tokenizer.json, vocab.txt, etc.)
TorchScript mode (standard) File ./Transformer_model/traced_model.pt
TorchScript mode (neuron) File ./Transformer_model/traced_{model_name}_model_neuron_batch_{batch_size}.pt
TorchScript mode (neuronx) File ./Transformer_model/traced_{model_name}_model_neuronx_batch_{batch_size}.pt

Side Effects

  • Creates ./Transformer_model/ directory if it does not exist
  • Downloads model weights from HuggingFace Hub (or uses cached versions)
  • Writes model and tokenizer artifacts to disk

Usage Examples

Example 1: Download for Sequence Classification (Pretrained Mode)

transformers_model_dowloader(
    mode="sequence_classification",
    pretrained_model_name="bert-base-uncased",
    num_labels=2,
    do_lower_case=True,
    max_length=150,
    torchscript=False,
    hardware=None,
    batch_size=1,
)
# Result: ./Transformer_model/ contains pretrained model and tokenizer files

Example 2: Download for Question Answering (TorchScript Mode)

transformers_model_dowloader(
    mode="question_answering",
    pretrained_model_name="bert-large-uncased-whole-word-masking-finetuned-squad",
    num_labels=2,
    do_lower_case=True,
    max_length=128,
    torchscript=True,
    hardware=None,
    batch_size=1,
)
# Result: ./Transformer_model/traced_model.pt

Example 3: Download for AWS Inferentia (NeuronX)

transformers_model_dowloader(
    mode="sequence_classification",
    pretrained_model_name="bert-base-uncased",
    num_labels=2,
    do_lower_case=True,
    max_length=128,
    torchscript=True,
    hardware="neuronx",
    batch_size=4,
)
# Result: ./Transformer_model/traced_bert-base-uncased_model_neuronx_batch_4.pt

Example 4: Command-Line Invocation

# The script reads settings from model-config.yaml by default
python Download_Transformer_models.py

# Or specify a custom config
python Download_Transformer_models.py custom-config.yaml

The script's __main__ block reads the YAML file, extracts handler settings, and calls the function:

# From __main__ (L149-180)
dirname = os.path.dirname(__file__)
if len(sys.argv) > 1:
    filename = os.path.join(dirname, sys.argv[1])
else:
    filename = os.path.join(dirname, "model-config.yaml")
f = open(filename)
model_yaml_config = yaml.safe_load(f)
settings = model_yaml_config["handler"]
mode = settings["mode"]
model_name = settings["model_name"]
num_labels = int(settings["num_labels"])
do_lower_case = settings["do_lower_case"]
max_length = settings["max_length"]
save_mode = settings["save_mode"]
if save_mode == "torchscript":
    torchscript = True
else:
    torchscript = False
hardware = settings.get("hardware")
batch_size = int(settings.get("batch_size", "1"))

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment