Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Triton inference server Server Convert Checkpoint

From Leeroopedia

Metadata

Field Value
Type Implementation
Workflow LLM_Deployment_With_TRT_LLM
Repo Triton_inference_server_Server
Source docs/getting_started/llm.md:L88-95
Domains NLP, Model_Optimization
Knowledge_Sources TRT-LLM Docs|https://nvidia.github.io/TensorRT-LLM/, source::Repo|Triton Server|https://github.com/triton-inference-server/server
implements Principle:Triton_inference_server_Server_Weight_Conversion
2026-02-13 17:00 GMT

Overview

Concrete TRT-LLM checkpoint conversion script for HuggingFace models. This implementation covers the exact CLI invocation of convert_checkpoint.py to transform HuggingFace model weights into TRT-LLM's checkpoint format.

Description

The convert_checkpoint.py script is a model-specific conversion utility included in the TensorRT-LLM examples directory. Each supported model architecture (Phi-3, LLaMA, GPT, Falcon, etc.) has its own conversion script that understands the source weight layout and produces TRT-LLM compatible checkpoints.

The script reads the HuggingFace model directory, extracts weight tensors from safetensors or PyTorch bin files, applies dtype conversion, and writes the result in TRT-LLM's checkpoint format with an accompanying configuration file.

Usage

Run from the TRT-LLM examples directory for the target model architecture. Requires the HuggingFace model directory from the previous download step.

Code Reference

Source Location

Item Value
File docs/getting_started/llm.md
Lines L88-95
Repo https://github.com/triton-inference-server/server
Script location TensorRT-LLM/examples/phi/convert_checkpoint.py (model-specific)

Signature

python3 ./convert_checkpoint.py \
    --model_dir ./Phi-3-mini-4k-instruct \
    --output_dir ./phi-checkpoint \
    --dtype float16

Import / Verification

# Verify the checkpoint directory was created
ls -lh ./phi-checkpoint/

# Check the config file
cat ./phi-checkpoint/config.json

I/O Contract

Inputs

Name Type Description
--model_dir Directory path Path to HuggingFace model weights directory (e.g., ./Phi-3-mini-4k-instruct)
--output_dir Directory path Target directory for TRT-LLM checkpoint output (e.g., ./phi-checkpoint)
--dtype String Target data type for weight conversion: float16 or bfloat16

Outputs

Name Type Description
Checkpoint directory Directory ./phi-checkpoint/ containing converted weight files
config.json JSON file TRT-LLM model configuration with architecture parameters
Weight files Binary files Converted weight tensors in TRT-LLM format

Usage Examples

Convert Phi-3-mini-4k with float16

cd TensorRT-LLM/examples/phi

python3 ./convert_checkpoint.py \
    --model_dir ./Phi-3-mini-4k-instruct \
    --output_dir ./phi-checkpoint \
    --dtype float16

Key Parameters

Parameter Description Example Value
--model_dir Path to the HuggingFace model directory ./Phi-3-mini-4k-instruct
--output_dir Path for the TRT-LLM checkpoint output ./phi-checkpoint
--dtype Target precision for converted weights float16, bfloat16

Convert with tensor parallelism for multi-GPU

python3 ./convert_checkpoint.py \
    --model_dir ./Phi-3-mini-4k-instruct \
    --output_dir ./phi-checkpoint-tp2 \
    --dtype float16 \
    --tp_size 2

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment