Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Ggml org Ggml Weight extraction pattern

From Leeroopedia


Weight Extraction Pattern

Type: Pattern Doc (not a single API, but a user-implemented pattern per model type)

Language: Python

Repository: https://github.com/ggml-org/ggml

Overview

This document catalogues the recurring pattern used across GGML conversion scripts to extract weight tensors from framework-specific checkpoint formats and write them into GGML-compatible files. Each model type implements this pattern independently, adapting name mapping, type conversion, and transposition logic to the source framework and architecture.

Interface Pattern

# Pattern: Weight extraction and mapping
for name, tensor in source_model.items():
    # 1. Map name: framework_name -> ggml_name
    # 2. Convert type: float32 -> float16 for 2D, keep float32 for 1D
    # 3. Transpose if needed
    # 4. Write to output format

Example Implementations

1. HuggingFace GPT-2

File: examples/gpt-2/convert-h5-to-ggml.py:L105-190

  • Uses regex-based name mapping to translate HuggingFace tensor paths into GGML names.
  • Applies transposition for projection matrices (attention and MLP layers).
  • Converts 2D tensors to float16; keeps 1D tensors (biases, layer-norm weights) in float32.

2. PyTorch SAM (Segment Anything)

File: examples/sam/convert-pth-to-ggml.py:L83-142

  • Iterates directly over the PyTorch state_dict.
  • Performs direct name-to-name mapping with type conversion.
  • Simpler mapping logic because SAM's architecture uses relatively flat naming.

3. Darknet YOLO

File: examples/yolo/convert-yolov3-tiny.py:L6-22 (save_conv2d_layer)

  • Reads from Darknet's binary format rather than a Python-native checkpoint.
  • Handles interleaved batch-normalisation / bias / weight data within each convolutional layer.
  • Writes each component separately with appropriate type tagging.

Common Pattern

All implementations follow the same high-level sequence:

  1. Iterate over the source model's state_dict, variable list, or binary stream.
  2. Map names from the framework convention to the GGML convention.
  3. Convert types — float32 to float16 for 2D weight matrices; float32 preserved for 1D tensors (biases, norms).
  4. Write the converted tensors to the GGML output format with appropriate headers.

Key Decisions

  • Which tensors to skip — certain tensors are not needed in the GGML graph (e.g., attn.masked_bias in GPT-2 is a causal mask computed at runtime).
  • Which tensors to transpose — depends on the source framework's storage convention versus GGML's expected layout.
  • Type precision per dimension — the 2D-to-float16 / 1D-to-float32 heuristic is the most common, but some scripts override this for specific tensors.

Dependencies

Dependency Role
numpy Array manipulation, type casting, transposition
torch Loading PyTorch checkpoints (.pth, .bin)
tensorflow Loading TensorFlow checkpoints
transformers Loading HuggingFace model weights and tokeniser configs

Related


Last updated: 2025-05-15 12:00 GMT

GGML

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment