Principle:Ggml org Ggml Weight Extraction and Mapping

Weight Extraction and Mapping

Extracting weight tensors from framework-specific formats and mapping them to GGML naming conventions.

Overview

Different ML frameworks (PyTorch, TensorFlow, HuggingFace, Keras) each use their own tensor naming and layout conventions. Converting a trained model to GGML format requires extracting the raw weight tensors from the source framework and systematically mapping them into GGML's expected naming scheme and data layout. This is a Pattern Doc — users implement this pattern per-model type rather than calling a single unified API.

Theory

Every ML framework persists trained weights differently:

PyTorch stores tensors in a state_dict keyed by module path (e.g., transformer.h.0.attn.c_attn.weight).
TensorFlow / Keras uses variable scopes and layer names.
HuggingFace Transformers wraps framework-specific checkpoints with its own naming layer.

GGML defines its own flat naming convention for tensors. A conversion script must bridge the gap between the source format and GGML's expectations.

Name Mapping

Framework-specific tensor names must be converted to the GGML naming convention. For example:

Source (HuggingFace GPT-2)	GGML Name
`transformer.h.0.attn.c_attn.weight`	`model/h0/attn/c_attn/w`
`transformer.h.0.ln_1.weight`	`model/h0/ln_1/g`
`transformer.h.0.ln_1.bias`	`model/h0/ln_1/b`

The mapping is typically performed through string manipulation, regex substitution, or lookup tables defined per-model architecture.

Type Conversion

2D weights (matrices) — typically converted from float32 to float16 for size reduction while preserving acceptable precision.
1D tensors (biases, layer norms) — kept in float32 because they are small and more sensitive to quantisation error.

This heuristic (dimension-based type selection) is the most common pattern across GGML conversion scripts.

Transposition

Some frameworks store weight matrices transposed relative to GGML's expected layout. Conversion scripts must detect and apply transposition where needed. For instance, HuggingFace GPT-2 projection matrices are stored in a transposed shape and must be flipped before writing to the GGML file.

Pattern Summary

The general extraction and mapping pattern is:

for each tensor in source_model:
    1. Map the framework-specific name to the GGML name
    2. Convert type: float32 -> float16 for 2D, keep float32 for 1D
    3. Transpose the tensor if the framework stores it in a different layout
    4. Write the tensor to the GGML output format

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment