Implementation:Pytorch Serve Transformers Model Dowloader
| Field | Value |
|---|---|
| Page Type | Implementation |
| Title | Transformers Model Dowloader |
| Type | API Doc |
| Short Description | Script that downloads pretrained HuggingFace Transformer models and tokenizers, optionally traces them to TorchScript, and saves all artifacts to a local directory for model archiving |
| Domains | NLP, Model_Serving |
| Source | examples/Huggingface_Transformers/Download_Transformer_models.py:L22-146 |
| Knowledge Sources | TorchServe |
| Workflow | HuggingFace_Transformer_Serving |
| Last Updated | 2026-02-13 00:00 GMT |
Overview
The Download_Transformer_models.py script provides the transformers_model_dowloader function, which automates the process of downloading a pretrained HuggingFace Transformer model and its associated tokenizer, then saving them in one of two formats (pretrained or TorchScript) to a local directory. The script reads its configuration from a YAML file and supports four NLP task modes along with specialized hardware tracing for AWS Inferentia.
Description
The function handles the complete model preparation pipeline: loading the appropriate model class based on the NLP task mode, downloading (or loading from cache) the pretrained weights and tokenizer, and serializing the artifacts to the ./Transformer_model/ directory. It supports two serialization strategies and three hardware tracing targets.
Usage
The script is invoked from the command line with an optional YAML configuration file argument:
# Using default model-config.yaml
python Download_Transformer_models.py
# Using a custom configuration file
python Download_Transformer_models.py my-custom-config.yaml
Code Reference
Source Location
| Field | Value |
|---|---|
| Repository | pytorch/serve |
| File | examples/Huggingface_Transformers/Download_Transformer_models.py
|
| Lines | L22-146 |
Signature
def transformers_model_dowloader(
mode,
pretrained_model_name,
num_labels,
do_lower_case,
max_length,
torchscript,
hardware,
batch_size,
):
"""This function, save the checkpoint, config file along with tokenizer config and vocab files
of a transformer model of your choice.
"""
Parameters
| Parameter | Type | Description |
|---|---|---|
| mode | str | NLP task mode: "sequence_classification", "question_answering", "token_classification", or "text_generation"
|
| pretrained_model_name | str | HuggingFace model identifier (e.g., "bert-base-uncased")
|
| num_labels | int | Number of output labels for classification tasks |
| do_lower_case | bool | Whether the tokenizer should lowercase input text |
| max_length | int | Maximum token sequence length for TorchScript tracing dummy input |
| torchscript | bool | Whether to enable TorchScript mode in the model configuration |
| hardware | str or None | Hardware target for tracing: "neuron", "neuronx", or None for standard PyTorch
|
| batch_size | int | Batch size for hardware-specific (Neuron/NeuronX) tracing |
Return Value
Returns None. The function produces side effects by writing model and tokenizer files to the ./Transformer_model/ directory.
Import
import os
import sys
import torch
import transformers
import yaml
from transformers import (
AutoConfig,
AutoModelForCausalLM,
AutoModelForQuestionAnswering,
AutoModelForSequenceClassification,
AutoModelForTokenClassification,
AutoTokenizer,
set_seed,
)
I/O Contract
Input
| Input | Type | Description |
|---|---|---|
| mode | str | One of four NLP task mode strings |
| pretrained_model_name | str | A valid HuggingFace model identifier or local path |
| num_labels | int | Positive integer specifying classification head size |
| do_lower_case | bool | Tokenizer casing flag |
| max_length | int | Sequence length for padding during TorchScript tracing |
| torchscript | bool | Flag derived from save_mode in YAML
|
| hardware | str or None | Optional hardware target string |
| batch_size | int | Batch size (defaults to 1 if not specified in YAML) |
Output
| Output | Format | Description |
|---|---|---|
| Pretrained mode | Directory | ./Transformer_model/ containing model weights (pytorch_model.bin or model.safetensors), config.json, tokenizer files (tokenizer.json, vocab.txt, etc.)
|
| TorchScript mode (standard) | File | ./Transformer_model/traced_model.pt
|
| TorchScript mode (neuron) | File | ./Transformer_model/traced_{model_name}_model_neuron_batch_{batch_size}.pt
|
| TorchScript mode (neuronx) | File | ./Transformer_model/traced_{model_name}_model_neuronx_batch_{batch_size}.pt
|
Side Effects
- Creates
./Transformer_model/directory if it does not exist - Downloads model weights from HuggingFace Hub (or uses cached versions)
- Writes model and tokenizer artifacts to disk
Usage Examples
Example 1: Download for Sequence Classification (Pretrained Mode)
transformers_model_dowloader(
mode="sequence_classification",
pretrained_model_name="bert-base-uncased",
num_labels=2,
do_lower_case=True,
max_length=150,
torchscript=False,
hardware=None,
batch_size=1,
)
# Result: ./Transformer_model/ contains pretrained model and tokenizer files
Example 2: Download for Question Answering (TorchScript Mode)
transformers_model_dowloader(
mode="question_answering",
pretrained_model_name="bert-large-uncased-whole-word-masking-finetuned-squad",
num_labels=2,
do_lower_case=True,
max_length=128,
torchscript=True,
hardware=None,
batch_size=1,
)
# Result: ./Transformer_model/traced_model.pt
Example 3: Download for AWS Inferentia (NeuronX)
transformers_model_dowloader(
mode="sequence_classification",
pretrained_model_name="bert-base-uncased",
num_labels=2,
do_lower_case=True,
max_length=128,
torchscript=True,
hardware="neuronx",
batch_size=4,
)
# Result: ./Transformer_model/traced_bert-base-uncased_model_neuronx_batch_4.pt
Example 4: Command-Line Invocation
# The script reads settings from model-config.yaml by default
python Download_Transformer_models.py
# Or specify a custom config
python Download_Transformer_models.py custom-config.yaml
The script's __main__ block reads the YAML file, extracts handler settings, and calls the function:
# From __main__ (L149-180)
dirname = os.path.dirname(__file__)
if len(sys.argv) > 1:
filename = os.path.join(dirname, sys.argv[1])
else:
filename = os.path.join(dirname, "model-config.yaml")
f = open(filename)
model_yaml_config = yaml.safe_load(f)
settings = model_yaml_config["handler"]
mode = settings["mode"]
model_name = settings["model_name"]
num_labels = int(settings["num_labels"])
do_lower_case = settings["do_lower_case"]
max_length = settings["max_length"]
save_mode = settings["save_mode"]
if save_mode == "torchscript":
torchscript = True
else:
torchscript = False
hardware = settings.get("hardware")
batch_size = int(settings.get("batch_size", "1"))
Related Pages
- Principle:Pytorch_Serve_Transformer_Model_Preparation - The principle describing why and how models are prepared for serving
- Implementation:Pytorch_Serve_Transformer_Handler_Config - The YAML configuration file consumed by this script