Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Mtmd MiniCPMV

From Leeroopedia
Knowledge Sources
Domains Multimodal, VisionEncoder
Last Updated 2025-02-15 00:00 GMT

Overview

Multimodal graph builder for the MiniCPM-V vision model, implementing a ViT with a query-based resampler projector.

Description

Implements clip_graph_minicpmv::build() which constructs a ggml computation graph for the MiniCPM-V vision encoder. Uses a standard ViT with learned position embeddings selected via index tensor, then applies a "resampler" projector -- a small cross-attention transformer. The resampler uses learned query tokens, computes sinusoidal 2D position embeddings from an omega base frequency and height/width positions (via outer product, sin/cos, and concatenation), adds position embeddings to keys (k = v + pos_embed), performs cross-attention between queries and ViT features with configurable head count (d_head=128), and applies layer normalization and a final linear projection to produce a fixed number of output tokens.

Usage

Automatically selected when the loaded CLIP model uses the PROJECTOR_TYPE_MINICPMV projector type.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/tools/mtmd/models/minicpmv.cpp
  • Lines: 1-114

Signature

struct clip_graph_minicpmv : public clip_graph {
    ggml_cgraph * build() override;
};

Import

#include "models.h"

I/O Contract

Inputs

Name Type Required Description
model clip_model & Yes Loaded MiniCPM-V model with ViT and resampler weights
img clip_image_f32 & Yes Preprocessed float image tensor
omega ggml_tensor * Yes Base frequency tensor for sinusoidal position embeddings
pos_h, pos_w ggml_tensor * Yes 2D position coordinates for each patch

Outputs

Name Type Description
ggml_cgraph * pointer Computation graph producing fixed-count resampled embeddings

Usage Examples

// Instantiated internally by clip.cpp for MiniCPM-V models
clip_graph_minicpmv graph(ctx, img);
ggml_cgraph * gf = graph.build();
// Output: [n_mmproj_embd, minicpmv_query_num] embeddings

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment