Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Mtmd Llama4

From Leeroopedia
Knowledge Sources
Domains Multimodal, VisionEncoder
Last Updated 2025-02-15 00:00 GMT

Overview

Multimodal graph builder for the Llama 4 vision model, implementing unfold convolution, 2D RoPE, pixel shuffle, and an MLP projector.

Description

Implements clip_graph_llama4::build() which constructs a ggml computation graph for the Llama 4 vision encoder. Uses an unfold convolution for patch embedding (ggml_im2col followed by ggml_mul_mat), a CLS token appended via concatenation, 2D RoPE position embeddings for spatial awareness with separate height/width position tensors, a standard ViT with learned position embeddings, CLS token removal post-encoding, pixel shuffle downsampling (reshaping and permuting to reduce spatial resolution), a two-layer MLP with GELU activation, and a final linear projector (Llama4MultiModalProjector). Only supports square images.

Usage

Automatically selected when the loaded CLIP model uses the PROJECTOR_TYPE_LLAMA4 projector type.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/tools/mtmd/models/llama4.cpp
  • Lines: 1-96

Signature

struct clip_graph_llama4 : public clip_graph {
    ggml_cgraph * build() override;
};

Import

#include "models.h"

I/O Contract

Inputs

Name Type Required Description
model clip_model & Yes Loaded Llama 4 CLIP model with weights
img clip_image_f32 & Yes Preprocessed float image (must be square)

Outputs

Name Type Description
ggml_cgraph * pointer Computation graph producing pixel-shuffled, projected embeddings

Usage Examples

// Instantiated internally by clip.cpp for Llama 4 models
clip_graph_llama4 graph(ctx, img);
ggml_cgraph * gf = graph.build();
// Output: [n_mmproj_embd, n_patches / scale_factor^2] embeddings

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment