Implementation:Ollama Ollama Mtmd MiniCPMV
| Knowledge Sources | |
|---|---|
| Domains | Multimodal, VisionEncoder |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Multimodal graph builder for the MiniCPM-V vision model, implementing a ViT with a query-based resampler projector.
Description
Implements clip_graph_minicpmv::build() which constructs a ggml computation graph for the MiniCPM-V vision encoder. Uses a standard ViT with learned position embeddings selected via index tensor, then applies a "resampler" projector -- a small cross-attention transformer. The resampler uses learned query tokens, computes sinusoidal 2D position embeddings from an omega base frequency and height/width positions (via outer product, sin/cos, and concatenation), adds position embeddings to keys (k = v + pos_embed), performs cross-attention between queries and ViT features with configurable head count (d_head=128), and applies layer normalization and a final linear projection to produce a fixed number of output tokens.
Usage
Automatically selected when the loaded CLIP model uses the PROJECTOR_TYPE_MINICPMV projector type.
Code Reference
Source Location
- Repository: Ollama
- File: llama/llama.cpp/tools/mtmd/models/minicpmv.cpp
- Lines: 1-114
Signature
struct clip_graph_minicpmv : public clip_graph {
ggml_cgraph * build() override;
};
Import
#include "models.h"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | clip_model & | Yes | Loaded MiniCPM-V model with ViT and resampler weights |
| img | clip_image_f32 & | Yes | Preprocessed float image tensor |
| omega | ggml_tensor * | Yes | Base frequency tensor for sinusoidal position embeddings |
| pos_h, pos_w | ggml_tensor * | Yes | 2D position coordinates for each patch |
Outputs
| Name | Type | Description |
|---|---|---|
| ggml_cgraph * | pointer | Computation graph producing fixed-count resampled embeddings |
Usage Examples
// Instantiated internally by clip.cpp for MiniCPM-V models
clip_graph_minicpmv graph(ctx, img);
ggml_cgraph * gf = graph.build();
// Output: [n_mmproj_embd, minicpmv_query_num] embeddings