Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ggml org Ggml Sam build fast graph

From Leeroopedia


Template:Implementation

Summary

sam_build_fast_graph is a C++ function that constructs a GGML computation graph for SAM's prompt-guided mask decoding. It combines prompt encoding and mask decoding into a single graph that, when computed, produces low-resolution segmentation masks and IoU quality predictions from pre-computed image embeddings and user-provided prompts.

API

struct ggml_cgraph * sam_build_fast_graph(
    const sam_model & model,
    sam_state & state,
    int nx,
    int ny,
    const sam_prompt & prompt,
    bool multimask_output
)

Source

Parameters

Parameter Type Description
model const sam_model & Loaded SAM model containing all weights (image encoder, prompt encoder, mask decoder)
state sam_state & Runtime state holding pre-computed image embeddings from the image encoder; also receives output tensors
nx int Original image width in pixels
ny int Original image height in pixels
prompt const sam_prompt & User prompt containing point and/or bounding box coordinates for segmentation
multimask_output bool When true, generates multiple candidate masks with IoU-based ranking; when false, generates a single mask

Returns

  • Type: struct ggml_cgraph *
  • Description: A GGML computation graph that, when executed via ggml_backend_sched_graph_compute, populates state.low_res_masks (low-resolution mask logits) and state.iou_predictions (quality scores for each mask)

Implementation

The function orchestrates two major sub-functions into a unified computation graph:

1. Prompt Encoding (sam_encode_prompt)

  • Source: examples/sam/sam.cpp, lines 1449-1521
  • Input: Point/box coordinates from the prompt
  • Process:
    1. Convert point and box coordinates to normalized positions
    2. Apply Gaussian positional encoding (random Fourier features) to produce spatial embeddings
    3. Add learned foreground/background type embeddings to point encodings
    4. For boxes, encode top-left and bottom-right corners with learned corner-type embeddings
    5. Concatenate all point and box embeddings into a sparse embedding tensor
    6. Produce dense embeddings from learned parameters (not dependent on the specific prompt)
  • Output: Sparse prompt embeddings and dense prompt embeddings

2. Mask Decoding (sam_decode_mask)

  • Source: examples/sam/sam.cpp, lines 1604-1844
  • Input: Image embeddings from state, sparse and dense prompt embeddings from encoding step
  • Process:
    1. Prepend IoU output token and mask output tokens to the sparse prompt embeddings
    2. Run 2-layer transformer with bidirectional cross-attention between prompt tokens and image embedding tokens
    3. Extract the IoU output token and pass through an MLP head to predict mask quality scores
    4. Extract mask output tokens and pass each through a hypernetwork MLP to produce per-mask weight vectors
    5. Reshape and upscale image features via two ConvTranspose2d layers (4x spatial upscaling, from 64x64 to 256x256)
    6. Compute dot product between hypernetwork weight vectors and upscaled features to produce low-resolution mask logits
  • Output: state.low_res_masks and state.iou_predictions

Output Processing (sam_write_masks)

  • Source: examples/sam/sam.cpp, lines 1846-2002
  • Description: Post-processes the decoder output into final segmentation masks
  • Pipeline:
    1. Bilinear upscale low-resolution masks from 256x256 to 1024x1024 (SAM's internal high resolution)
    2. Crop and resize from 1024x1024 to the original image dimensions (nx x ny)
    3. Threshold mask logits at 0.0 to produce binary masks
    4. Filter candidate masks by IoU prediction score to select the best mask(s)
    5. Write final masks to output (e.g., as image files)

Dependencies

  • GGML core: Tensor operations, computation graph construction (ggml_build_forward_expand, ggml_new_tensor, etc.)
  • sam_encode_prompt: Prompt encoding sub-function (lines 1449-1521)
  • sam_decode_mask: Mask decoding sub-function (lines 1604-1844)
  • sam_write_masks: Output post-processing (lines 1846-2002)
  • Pre-computed state: Requires state to contain image embeddings from a prior call to the image encoder

Related

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment