Principle:LaurentMazare Tch rs LLM Weight Conversion

Knowledge Sources	tch-rs
Domains	NLP, Model_Serialization
Last Updated	2026-02-08 14:00 GMT

Overview

Process of converting original LLaMA checkpoint weights from PyTorch pickle format to safetensors format with renamed keys matching the Rust model architecture.

Description

Large language model checkpoints are typically distributed in PyTorch's pickle format (.pth) with key names matching the original Python model implementation. To load these in a Rust model with different naming conventions, the weights must be renamed and optionally converted to a different precision (e.g., float16). The conversion also handles concatenation of separate query/key/value projection weights into a single combined attention weight matrix, and remaps layer names to match the tch-rs model path hierarchy.

Usage

Run this conversion once before running the Rust LLaMA inference. The output safetensors file is the input for the memory-mapped weight loading step.

Theoretical Basis

Conversion Steps:
  1. Load original .pth checkpoint with PyTorch
  2. Rename keys:
     layers.{N}.attention.wq.weight → transformer.h.{N}.attn.c_attn.weight (concatenated Q+K+V)
     layers.{N}.feed_forward.w1.weight → transformer.h.{N}.mlp.c_fc1.weight
     tok_embeddings.weight → transformer.wte.weight
  3. Concatenate Q, K, V weights along dim=0 for fused attention
  4. Convert dtype to float16
  5. Save as safetensors format

Related Pages

Implemented By

Implementation:LaurentMazare_Tch_rs_Convert_Checkpoint

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment