Principle:Ggml org Ggml Weight Extraction and Mapping
Weight Extraction and Mapping
Extracting weight tensors from framework-specific formats and mapping them to GGML naming conventions.
Overview
Different ML frameworks (PyTorch, TensorFlow, HuggingFace, Keras) each use their own tensor naming and layout conventions. Converting a trained model to GGML format requires extracting the raw weight tensors from the source framework and systematically mapping them into GGML's expected naming scheme and data layout. This is a Pattern Doc — users implement this pattern per-model type rather than calling a single unified API.
Theory
Every ML framework persists trained weights differently:
- PyTorch stores tensors in a
state_dictkeyed by module path (e.g.,transformer.h.0.attn.c_attn.weight). - TensorFlow / Keras uses variable scopes and layer names.
- HuggingFace Transformers wraps framework-specific checkpoints with its own naming layer.
GGML defines its own flat naming convention for tensors. A conversion script must bridge the gap between the source format and GGML's expectations.
Name Mapping
Framework-specific tensor names must be converted to the GGML naming convention. For example:
| Source (HuggingFace GPT-2) | GGML Name |
|---|---|
transformer.h.0.attn.c_attn.weight |
model/h0/attn/c_attn/w
|
transformer.h.0.ln_1.weight |
model/h0/ln_1/g
|
transformer.h.0.ln_1.bias |
model/h0/ln_1/b
|
The mapping is typically performed through string manipulation, regex substitution, or lookup tables defined per-model architecture.
Type Conversion
- 2D weights (matrices) — typically converted from float32 to float16 for size reduction while preserving acceptable precision.
- 1D tensors (biases, layer norms) — kept in float32 because they are small and more sensitive to quantisation error.
This heuristic (dimension-based type selection) is the most common pattern across GGML conversion scripts.
Transposition
Some frameworks store weight matrices transposed relative to GGML's expected layout. Conversion scripts must detect and apply transposition where needed. For instance, HuggingFace GPT-2 projection matrices are stored in a transposed shape and must be flipped before writing to the GGML file.
Pattern Summary
The general extraction and mapping pattern is:
for each tensor in source_model:
1. Map the framework-specific name to the GGML name
2. Convert type: float32 -> float16 for 2D, keep float32 for 1D
3. Transpose the tensor if the framework stores it in a different layout
4. Write the tensor to the GGML output format
Related
Last updated: 2025-05-15 12:00 GMT