Implementation:Pytorch Serve Transformers Model Dowloader

Field	Value
Page Type	Implementation
Title	Transformers Model Dowloader
Type	API Doc
Short Description	Script that downloads pretrained HuggingFace Transformer models and tokenizers, optionally traces them to TorchScript, and saves all artifacts to a local directory for model archiving
Domains	NLP, Model_Serving
Source	examples/Huggingface_Transformers/Download_Transformer_models.py:L22-146
Knowledge Sources	TorchServe
Workflow	HuggingFace_Transformer_Serving
Last Updated	2026-02-13 00:00 GMT

Overview

The Download_Transformer_models.py script provides the transformers_model_dowloader function, which automates the process of downloading a pretrained HuggingFace Transformer model and its associated tokenizer, then saving them in one of two formats (pretrained or TorchScript) to a local directory. The script reads its configuration from a YAML file and supports four NLP task modes along with specialized hardware tracing for AWS Inferentia.

Description

The function handles the complete model preparation pipeline: loading the appropriate model class based on the NLP task mode, downloading (or loading from cache) the pretrained weights and tokenizer, and serializing the artifacts to the ./Transformer_model/ directory. It supports two serialization strategies and three hardware tracing targets.

Usage

The script is invoked from the command line with an optional YAML configuration file argument:

# Using default model-config.yaml
python Download_Transformer_models.py

# Using a custom configuration file
python Download_Transformer_models.py my-custom-config.yaml

Code Reference

Source Location

Field	Value
Repository	pytorch/serve
File	`examples/Huggingface_Transformers/Download_Transformer_models.py`
Lines	L22-146

Signature

def transformers_model_dowloader(
    mode,
    pretrained_model_name,
    num_labels,
    do_lower_case,
    max_length,
    torchscript,
    hardware,
    batch_size,
):
    """This function, save the checkpoint, config file along with tokenizer config and vocab files
    of a transformer model of your choice.
    """

Parameters

Parameter	Type	Description
mode	str	NLP task mode: `"sequence_classification"`, `"question_answering"`, `"token_classification"`, or `"text_generation"`
pretrained_model_name	str	HuggingFace model identifier (e.g., `"bert-base-uncased"`)
num_labels	int	Number of output labels for classification tasks
do_lower_case	bool	Whether the tokenizer should lowercase input text
max_length	int	Maximum token sequence length for TorchScript tracing dummy input
torchscript	bool	Whether to enable TorchScript mode in the model configuration
hardware	str or None	Hardware target for tracing: `"neuron"`, `"neuronx"`, or `None` for standard PyTorch
batch_size	int	Batch size for hardware-specific (Neuron/NeuronX) tracing

Return Value

Returns None. The function produces side effects by writing model and tokenizer files to the ./Transformer_model/ directory.

Import

import os
import sys
import torch
import transformers
import yaml
from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    AutoModelForQuestionAnswering,
    AutoModelForSequenceClassification,
    AutoModelForTokenClassification,
    AutoTokenizer,
    set_seed,
)

I/O Contract

Input

Input	Type	Description
mode	str	One of four NLP task mode strings
pretrained_model_name	str	A valid HuggingFace model identifier or local path
num_labels	int	Positive integer specifying classification head size
do_lower_case	bool	Tokenizer casing flag
max_length	int	Sequence length for padding during TorchScript tracing
torchscript	bool	Flag derived from `save_mode` in YAML
hardware	str or None	Optional hardware target string
batch_size	int	Batch size (defaults to 1 if not specified in YAML)

Output

Output	Format	Description
Pretrained mode	Directory	`./Transformer_model/` containing model weights (`pytorch_model.bin` or `model.safetensors`), `config.json`, tokenizer files (`tokenizer.json`, `vocab.txt`, etc.)
TorchScript mode (standard)	File	`./Transformer_model/traced_model.pt`
TorchScript mode (neuron)	File	`./Transformer_model/traced_{model_name}_model_neuron_batch_{batch_size}.pt`
TorchScript mode (neuronx)	File	`./Transformer_model/traced_{model_name}_model_neuronx_batch_{batch_size}.pt`

Side Effects

Creates ./Transformer_model/ directory if it does not exist
Downloads model weights from HuggingFace Hub (or uses cached versions)
Writes model and tokenizer artifacts to disk

Usage Examples

Example 1: Download for Sequence Classification (Pretrained Mode)

transformers_model_dowloader(
    mode="sequence_classification",
    pretrained_model_name="bert-base-uncased",
    num_labels=2,
    do_lower_case=True,
    max_length=150,
    torchscript=False,
    hardware=None,
    batch_size=1,
)
# Result: ./Transformer_model/ contains pretrained model and tokenizer files

Example 2: Download for Question Answering (TorchScript Mode)

transformers_model_dowloader(
    mode="question_answering",
    pretrained_model_name="bert-large-uncased-whole-word-masking-finetuned-squad",
    num_labels=2,
    do_lower_case=True,
    max_length=128,
    torchscript=True,
    hardware=None,
    batch_size=1,
)
# Result: ./Transformer_model/traced_model.pt

Example 3: Download for AWS Inferentia (NeuronX)

transformers_model_dowloader(
    mode="sequence_classification",
    pretrained_model_name="bert-base-uncased",
    num_labels=2,
    do_lower_case=True,
    max_length=128,
    torchscript=True,
    hardware="neuronx",
    batch_size=4,
)
# Result: ./Transformer_model/traced_bert-base-uncased_model_neuronx_batch_4.pt

Example 4: Command-Line Invocation

# The script reads settings from model-config.yaml by default
python Download_Transformer_models.py

# Or specify a custom config
python Download_Transformer_models.py custom-config.yaml

The script's __main__ block reads the YAML file, extracts handler settings, and calls the function:

# From __main__ (L149-180)
dirname = os.path.dirname(__file__)
if len(sys.argv) > 1:
    filename = os.path.join(dirname, sys.argv[1])
else:
    filename = os.path.join(dirname, "model-config.yaml")
f = open(filename)
model_yaml_config = yaml.safe_load(f)
settings = model_yaml_config["handler"]
mode = settings["mode"]
model_name = settings["model_name"]
num_labels = int(settings["num_labels"])
do_lower_case = settings["do_lower_case"]
max_length = settings["max_length"]
save_mode = settings["save_mode"]
if save_mode == "torchscript":
    torchscript = True
else:
    torchscript = False
hardware = settings.get("hardware")
batch_size = int(settings.get("batch_size", "1"))

Related Pages

Principle:Pytorch_Serve_Transformer_Model_Preparation - The principle describing why and how models are prepared for serving
Implementation:Pytorch_Serve_Transformer_Handler_Config - The YAML configuration file consumed by this script

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment