Implementation:Ggml org Ggml Sam model load
Template:ImplementationHeader
Ggml_org_Ggml_Sam_model_load
Summary
Implementation of vision model weight loading in the GGML ecosystem, primarily through the SAM model loader and the YOLO model loader.
API
SAM Model Loader
bool sam_model_load(const sam_params & params, sam_model & model)
- Source:
examples/sam/sam.cpp:L508-1129(621 lines) - Repository: https://github.com/ggml-org/ggml
- Parameters:
params—sam_paramsstruct containing the model file path and runtime configuration.model— Outputsam_modelstruct, populated by the function.
- Returns:
bool—trueon success. On completion, thesam_modelis fully populated with all encoder/decoder tensors and a backend buffer is allocated.
YOLO Model Loader
load_model(const std::string & fname, yolo_model & model)
- Source:
examples/yolo/yolov3-tiny.cpp:L77-138 - Repository: https://github.com/ggml-org/ggml
- Note: Uses the GGUF reader for weight deserialization.
Behavior
Architecture Auto-Detection (SAM)
The SAM loader auto-detects the ViT variant from the n_enc_state dimension read from the model file:
| n_enc_state | Detected Variant | Derived Parameters |
|---|---|---|
| 768 | ViT-B (Base) | 12 heads, 12 encoder blocks |
| 1024 | ViT-L (Large) | 16 heads, 24 encoder blocks |
| 1280 | ViT-H (Huge) | 16 heads, 32 encoder blocks |
Components Loaded (SAM)
The function loads weights for all three major SAM components:
Image Encoder:
- Patch embedding (convolution weights and biases)
- Transformer blocks (attention QKV, projection, MLP layers, layer norms)
- Neck (convolution layers for channel reduction)
Prompt Encoder:
- Positional encoding (PE) matrix
- Point embeddings (foreground, background, top-left corner, bottom-right corner)
- Not-a-point embedding
- Mask downscaling layers
Mask Decoder:
- Two-way transformer (self-attention, cross-attention, MLP per layer)
- Upscaling layers (transposed convolutions, layer norms)
- Hypernetwork MLPs (one per output mask token)
- IoU prediction head
Components Loaded (YOLO)
The YOLO loader reads convolutional backbone weights and detection head parameters from a GGUF file, mapping them onto sequential conv/bn layers in the yolo_model struct.
Dependencies
| Header | Purpose | Used By |
|---|---|---|
ggml.h |
Core tensor operations and graph construction | SAM, YOLO |
ggml-alloc.h |
Tensor memory allocation | SAM, YOLO |
ggml-backend.h |
Backend buffer management | SAM, YOLO |
gguf.h |
GGUF file format reader | YOLO |
ggml-cpu.h |
CPU backend initialization | SAM, YOLO |
Language
- C++
Domain
Related
Source
Metadata
- Last updated: 2025-05-15 12:00 GMT