Implementation:Ggml org Ggml Sam build fast graph

Summary

sam_build_fast_graph is a C++ function that constructs a GGML computation graph for SAM's prompt-guided mask decoding. It combines prompt encoding and mask decoding into a single graph that, when computed, produces low-resolution segmentation masks and IoU quality predictions from pre-computed image embeddings and user-provided prompts.

API

struct ggml_cgraph * sam_build_fast_graph(
    const sam_model & model,
    sam_state & state,
    int nx,
    int ny,
    const sam_prompt & prompt,
    bool multimask_output
)

Source

File: examples/sam/sam.cpp, lines 2005-2094
Repository: https://github.com/ggml-org/ggml

Parameters

Parameter	Type	Description
`model`	`const sam_model &`	Loaded SAM model containing all weights (image encoder, prompt encoder, mask decoder)
`state`	`sam_state &`	Runtime state holding pre-computed image embeddings from the image encoder; also receives output tensors
`nx`	`int`	Original image width in pixels
`ny`	`int`	Original image height in pixels
`prompt`	`const sam_prompt &`	User prompt containing point and/or bounding box coordinates for segmentation
`multimask_output`	`bool`	When true, generates multiple candidate masks with IoU-based ranking; when false, generates a single mask

Returns

Type: struct ggml_cgraph *
Description: A GGML computation graph that, when executed via ggml_backend_sched_graph_compute, populates state.low_res_masks (low-resolution mask logits) and state.iou_predictions (quality scores for each mask)

Implementation

The function orchestrates two major sub-functions into a unified computation graph:

1. Prompt Encoding (sam_encode_prompt)

Source: examples/sam/sam.cpp, lines 1449-1521
Input: Point/box coordinates from the prompt
Process:
1. Convert point and box coordinates to normalized positions
2. Apply Gaussian positional encoding (random Fourier features) to produce spatial embeddings
3. Add learned foreground/background type embeddings to point encodings
4. For boxes, encode top-left and bottom-right corners with learned corner-type embeddings
5. Concatenate all point and box embeddings into a sparse embedding tensor
6. Produce dense embeddings from learned parameters (not dependent on the specific prompt)
Output: Sparse prompt embeddings and dense prompt embeddings

2. Mask Decoding (sam_decode_mask)

Source: examples/sam/sam.cpp, lines 1604-1844
Input: Image embeddings from state, sparse and dense prompt embeddings from encoding step
Process:
1. Prepend IoU output token and mask output tokens to the sparse prompt embeddings
2. Run 2-layer transformer with bidirectional cross-attention between prompt tokens and image embedding tokens
3. Extract the IoU output token and pass through an MLP head to predict mask quality scores
4. Extract mask output tokens and pass each through a hypernetwork MLP to produce per-mask weight vectors
5. Reshape and upscale image features via two ConvTranspose2d layers (4x spatial upscaling, from 64x64 to 256x256)
6. Compute dot product between hypernetwork weight vectors and upscaled features to produce low-resolution mask logits
Output: state.low_res_masks and state.iou_predictions

Output Processing (sam_write_masks)

Source: examples/sam/sam.cpp, lines 1846-2002
Description: Post-processes the decoder output into final segmentation masks
Pipeline:
1. Bilinear upscale low-resolution masks from 256x256 to 1024x1024 (SAM's internal high resolution)
2. Crop and resize from 1024x1024 to the original image dimensions (nx x ny)
3. Threshold mask logits at 0.0 to produce binary masks
4. Filter candidate masks by IoU prediction score to select the best mask(s)
5. Write final masks to output (e.g., as image files)

Dependencies

GGML core: Tensor operations, computation graph construction (ggml_build_forward_expand, ggml_new_tensor, etc.)
sam_encode_prompt: Prompt encoding sub-function (lines 1449-1521)
sam_decode_mask: Mask decoding sub-function (lines 1604-1844)
sam_write_masks: Output post-processing (lines 1846-2002)
Pre-computed state: Requires state to contain image embeddings from a prior call to the image encoder

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment

Summary

API

Source

Parameters

Returns

Implementation

1. Prompt Encoding (sam_encode_prompt)

2. Mask Decoding (sam_decode_mask)

Output Processing (sam_write_masks)

Dependencies

Related

Page Connections