Implementation:Ggml org Ggml Weight extraction pattern

Weight Extraction Pattern

Type: Pattern Doc (not a single API, but a user-implemented pattern per model type)

Language: Python

Repository: https://github.com/ggml-org/ggml

Overview

This document catalogues the recurring pattern used across GGML conversion scripts to extract weight tensors from framework-specific checkpoint formats and write them into GGML-compatible files. Each model type implements this pattern independently, adapting name mapping, type conversion, and transposition logic to the source framework and architecture.

Interface Pattern

# Pattern: Weight extraction and mapping
for name, tensor in source_model.items():
    # 1. Map name: framework_name -> ggml_name
    # 2. Convert type: float32 -> float16 for 2D, keep float32 for 1D
    # 3. Transpose if needed
    # 4. Write to output format

Example Implementations

1. HuggingFace GPT-2

File: examples/gpt-2/convert-h5-to-ggml.py:L105-190

Uses regex-based name mapping to translate HuggingFace tensor paths into GGML names.
Applies transposition for projection matrices (attention and MLP layers).
Converts 2D tensors to float16; keeps 1D tensors (biases, layer-norm weights) in float32.

2. PyTorch SAM (Segment Anything)

File: examples/sam/convert-pth-to-ggml.py:L83-142

Iterates directly over the PyTorch state_dict.
Performs direct name-to-name mapping with type conversion.
Simpler mapping logic because SAM's architecture uses relatively flat naming.

3. Darknet YOLO

File: examples/yolo/convert-yolov3-tiny.py:L6-22 (save_conv2d_layer)

Reads from Darknet's binary format rather than a Python-native checkpoint.
Handles interleaved batch-normalisation / bias / weight data within each convolutional layer.
Writes each component separately with appropriate type tagging.

Common Pattern

All implementations follow the same high-level sequence:

Iterate over the source model's state_dict, variable list, or binary stream.
Map names from the framework convention to the GGML convention.
Convert types — float32 to float16 for 2D weight matrices; float32 preserved for 1D tensors (biases, norms).
Write the converted tensors to the GGML output format with appropriate headers.

Key Decisions

Which tensors to skip — certain tensors are not needed in the GGML graph (e.g., attn.masked_bias in GPT-2 is a causal mask computed at runtime).
Which tensors to transpose — depends on the source framework's storage convention versus GGML's expected layout.
Type precision per dimension — the 2D-to-float16 / 1D-to-float32 heuristic is the most common, but some scripts override this for specific tensors.

Dependencies

Dependency	Role
`numpy`	Array manipulation, type casting, transposition
`torch`	Loading PyTorch checkpoints (`.pth`, `.bin`)
`tensorflow`	Loading TensorFlow checkpoints
`transformers`	Loading HuggingFace model weights and tokeniser configs

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment