Implementation:Triton inference server Server Convert Checkpoint

Metadata

Field	Value
Type	Implementation
Workflow	LLM_Deployment_With_TRT_LLM
Repo	Triton_inference_server_Server
Source	docs/getting_started/llm.md:L88-95
Domains	NLP, Model_Optimization
Knowledge_Sources	TRT-LLM Docs\|https://nvidia.github.io/TensorRT-LLM/, source::Repo\|Triton Server\|https://github.com/triton-inference-server/server
implements	Principle:Triton_inference_server_Server_Weight_Conversion
2026-02-13 17:00 GMT

Overview

Concrete TRT-LLM checkpoint conversion script for HuggingFace models. This implementation covers the exact CLI invocation of convert_checkpoint.py to transform HuggingFace model weights into TRT-LLM's checkpoint format.

Description

The convert_checkpoint.py script is a model-specific conversion utility included in the TensorRT-LLM examples directory. Each supported model architecture (Phi-3, LLaMA, GPT, Falcon, etc.) has its own conversion script that understands the source weight layout and produces TRT-LLM compatible checkpoints.

The script reads the HuggingFace model directory, extracts weight tensors from safetensors or PyTorch bin files, applies dtype conversion, and writes the result in TRT-LLM's checkpoint format with an accompanying configuration file.

Usage

Run from the TRT-LLM examples directory for the target model architecture. Requires the HuggingFace model directory from the previous download step.

Code Reference

Source Location

Item	Value
File	docs/getting_started/llm.md
Lines	L88-95
Repo	https://github.com/triton-inference-server/server
Script location	TensorRT-LLM/examples/phi/convert_checkpoint.py (model-specific)

Signature

python3 ./convert_checkpoint.py \
    --model_dir ./Phi-3-mini-4k-instruct \
    --output_dir ./phi-checkpoint \
    --dtype float16

Import / Verification

# Verify the checkpoint directory was created
ls -lh ./phi-checkpoint/

# Check the config file
cat ./phi-checkpoint/config.json

I/O Contract

Inputs

Name	Type	Description
`--model_dir`	Directory path	Path to HuggingFace model weights directory (e.g., `./Phi-3-mini-4k-instruct`)
`--output_dir`	Directory path	Target directory for TRT-LLM checkpoint output (e.g., `./phi-checkpoint`)
`--dtype`	String	Target data type for weight conversion: `float16` or `bfloat16`

Outputs

Name	Type	Description
Checkpoint directory	Directory	`./phi-checkpoint/` containing converted weight files
config.json	JSON file	TRT-LLM model configuration with architecture parameters
Weight files	Binary files	Converted weight tensors in TRT-LLM format

Usage Examples

Convert Phi-3-mini-4k with float16

cd TensorRT-LLM/examples/phi

python3 ./convert_checkpoint.py \
    --model_dir ./Phi-3-mini-4k-instruct \
    --output_dir ./phi-checkpoint \
    --dtype float16

Key Parameters

Parameter	Description	Example Value
`--model_dir`	Path to the HuggingFace model directory	`./Phi-3-mini-4k-instruct`
`--output_dir`	Path for the TRT-LLM checkpoint output	`./phi-checkpoint`
`--dtype`	Target precision for converted weights	`float16`, `bfloat16`

Convert with tensor parallelism for multi-GPU

python3 ./convert_checkpoint.py \
    --model_dir ./Phi-3-mini-4k-instruct \
    --output_dir ./phi-checkpoint-tp2 \
    --dtype float16 \
    --tp_size 2

Related Pages

Principle:Triton_inference_server_Server_Weight_Conversion
Implementation:Triton_inference_server_Server_Git_LFS_Clone — Prerequisite: model weight download
Implementation:Triton_inference_server_Server_Trtllm_Build — Next step: engine compilation
Environment:Triton_inference_server_Server_TRT_LLM_Deployment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment