Implementation:Ggml org Ggml Weight extraction pattern
Weight Extraction Pattern
Type: Pattern Doc (not a single API, but a user-implemented pattern per model type)
Language: Python
Repository: https://github.com/ggml-org/ggml
Overview
This document catalogues the recurring pattern used across GGML conversion scripts to extract weight tensors from framework-specific checkpoint formats and write them into GGML-compatible files. Each model type implements this pattern independently, adapting name mapping, type conversion, and transposition logic to the source framework and architecture.
Interface Pattern
# Pattern: Weight extraction and mapping
for name, tensor in source_model.items():
# 1. Map name: framework_name -> ggml_name
# 2. Convert type: float32 -> float16 for 2D, keep float32 for 1D
# 3. Transpose if needed
# 4. Write to output format
Example Implementations
1. HuggingFace GPT-2
File: examples/gpt-2/convert-h5-to-ggml.py:L105-190
- Uses regex-based name mapping to translate HuggingFace tensor paths into GGML names.
- Applies transposition for projection matrices (attention and MLP layers).
- Converts 2D tensors to float16; keeps 1D tensors (biases, layer-norm weights) in float32.
2. PyTorch SAM (Segment Anything)
File: examples/sam/convert-pth-to-ggml.py:L83-142
- Iterates directly over the PyTorch
state_dict. - Performs direct name-to-name mapping with type conversion.
- Simpler mapping logic because SAM's architecture uses relatively flat naming.
3. Darknet YOLO
File: examples/yolo/convert-yolov3-tiny.py:L6-22 (save_conv2d_layer)
- Reads from Darknet's binary format rather than a Python-native checkpoint.
- Handles interleaved batch-normalisation / bias / weight data within each convolutional layer.
- Writes each component separately with appropriate type tagging.
Common Pattern
All implementations follow the same high-level sequence:
- Iterate over the source model's state_dict, variable list, or binary stream.
- Map names from the framework convention to the GGML convention.
- Convert types — float32 to float16 for 2D weight matrices; float32 preserved for 1D tensors (biases, norms).
- Write the converted tensors to the GGML output format with appropriate headers.
Key Decisions
- Which tensors to skip — certain tensors are not needed in the GGML graph (e.g.,
attn.masked_biasin GPT-2 is a causal mask computed at runtime). - Which tensors to transpose — depends on the source framework's storage convention versus GGML's expected layout.
- Type precision per dimension — the 2D-to-float16 / 1D-to-float32 heuristic is the most common, but some scripts override this for specific tensors.
Dependencies
| Dependency | Role |
|---|---|
numpy |
Array manipulation, type casting, transposition |
torch |
Loading PyTorch checkpoints (.pth, .bin)
|
tensorflow |
Loading TensorFlow checkpoints |
transformers |
Loading HuggingFace model weights and tokeniser configs |
Related
- Principle:Ggml_org_Ggml_Weight_Extraction_and_Mapping
- Environment:Ggml_org_Ggml_Python_Tooling_Environment
Last updated: 2025-05-15 12:00 GMT