Principle:LaurentMazare Tch rs LLM Weight Conversion
| Knowledge Sources | |
|---|---|
| Domains | NLP, Model_Serialization |
| Last Updated | 2026-02-08 14:00 GMT |
Overview
Process of converting original LLaMA checkpoint weights from PyTorch pickle format to safetensors format with renamed keys matching the Rust model architecture.
Description
Large language model checkpoints are typically distributed in PyTorch's pickle format (.pth) with key names matching the original Python model implementation. To load these in a Rust model with different naming conventions, the weights must be renamed and optionally converted to a different precision (e.g., float16). The conversion also handles concatenation of separate query/key/value projection weights into a single combined attention weight matrix, and remaps layer names to match the tch-rs model path hierarchy.
Usage
Run this conversion once before running the Rust LLaMA inference. The output safetensors file is the input for the memory-mapped weight loading step.
Theoretical Basis
Conversion Steps:
1. Load original .pth checkpoint with PyTorch
2. Rename keys:
layers.{N}.attention.wq.weight → transformer.h.{N}.attn.c_attn.weight (concatenated Q+K+V)
layers.{N}.feed_forward.w1.weight → transformer.h.{N}.mlp.c_fc1.weight
tok_embeddings.weight → transformer.wte.weight
3. Concatenate Q, K, V weights along dim=0 for fused attention
4. Convert dtype to float16
5. Save as safetensors format