Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Mtmd LLaVA

From Leeroopedia
Knowledge Sources
Domains Multimodal, VisionEncoder
Last Updated 2025-02-15 00:00 GMT

Overview

Multimodal graph builder for LLaVA-style vision models, also serving as the default adapter for Granite and GLM-Edge variants.

Description

Implements clip_graph_llava::build() which constructs a ggml computation graph for the LLaVA vision encoder. It supports optional class embedding concatenation, learned position embeddings, pre-layer normalization, deep feature stacking (extracting activations from multiple intermediate layers as used by Granite vision), and various projector backends including LLaVA MLP, MLP with normalization, LDP/LDPv2 (MobileVLM), MiniCPM-V resampler, and GLM-Edge adapter. Each projector maps ViT features to the language model's embedding space.

Usage

Automatically selected by the CLIP system when the loaded model uses a LLaVA-compatible projector type (MLP, MLP_NORM, LDP, LDPV2, MiniCPM-V, GLM-Edge).

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/tools/mtmd/models/llava.cpp
  • Lines: 1-374

Signature

struct clip_graph_llava : public clip_graph {
    ggml_cgraph * build() override;
};

Import

#include "models.h"

I/O Contract

Inputs

Name Type Required Description
model clip_model & Yes Loaded CLIP model with weights and hyperparameters
img clip_image_f32 & Yes Preprocessed float image tensor
n_patches int Yes Number of image patches after patch embedding

Outputs

Name Type Description
ggml_cgraph * pointer Computation graph producing embeddings for the language model

Usage Examples

// The graph builder is instantiated internally by clip.cpp
// during clip_image_encode / clip_image_batch_encode
clip_graph_llava graph(ctx, img);
ggml_cgraph * gf = graph.build();
// gf is then evaluated by ggml backends to produce embeddings

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment