Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:LaurentMazare Tch rs LLM Weight Conversion

From Leeroopedia


Knowledge Sources
Domains NLP, Model_Serialization
Last Updated 2026-02-08 14:00 GMT

Overview

Process of converting original LLaMA checkpoint weights from PyTorch pickle format to safetensors format with renamed keys matching the Rust model architecture.

Description

Large language model checkpoints are typically distributed in PyTorch's pickle format (.pth) with key names matching the original Python model implementation. To load these in a Rust model with different naming conventions, the weights must be renamed and optionally converted to a different precision (e.g., float16). The conversion also handles concatenation of separate query/key/value projection weights into a single combined attention weight matrix, and remaps layer names to match the tch-rs model path hierarchy.

Usage

Run this conversion once before running the Rust LLaMA inference. The output safetensors file is the input for the memory-mapped weight loading step.

Theoretical Basis

Conversion Steps:
  1. Load original .pth checkpoint with PyTorch
  2. Rename keys:
     layers.{N}.attention.wq.weight → transformer.h.{N}.attn.c_attn.weight (concatenated Q+K+V)
     layers.{N}.feed_forward.w1.weight → transformer.h.{N}.mlp.c_fc1.weight
     tok_embeddings.weight → transformer.wte.weight
  3. Concatenate Q, K, V weights along dim=0 for fused attention
  4. Convert dtype to float16
  5. Save as safetensors format

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment