Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Mtmd CogVLM

From Leeroopedia
Knowledge Sources
Domains Multimodal, VisionEncoder
Last Updated 2025-02-15 00:00 GMT

Overview

Multimodal graph builder for the CogVLM vision model, constructing the ViT encoder and SwiGLU-based projector computation graph.

Description

Implements clip_graph_cogvlm::build() which constructs a ggml computation graph for the CogVLM vision encoder. The architecture uses class embedding concatenation, learned position embeddings, fused QKV attention with post-attention layer normalization, and SiLU-gated feed-forward layers with post-FFN normalization. The projector stage removes the CLS token, applies a linear projection, post-FC normalization with GELU, then a SwiGLU gate (h_to_4h and gate branches merged via ggml_swiglu_split), followed by a down-projection. Beginning-of-image (BOI) and end-of-image (EOI) tokens are concatenated to the output.

Usage

Automatically selected when the loaded CLIP model uses the PROJECTOR_TYPE_COGVLM projector type.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/tools/mtmd/models/cogvlm.cpp
  • Lines: 1-98

Signature

struct clip_graph_cogvlm : public clip_graph {
    ggml_cgraph * build() override;
};

Import

#include "models.h"

I/O Contract

Inputs

Name Type Required Description
model clip_model & Yes Loaded CogVLM CLIP model with weights
img clip_image_f32 & Yes Preprocessed float image tensor

Outputs

Name Type Description
ggml_cgraph * pointer Computation graph producing LLM-compatible embeddings with BOI/EOI tokens

Usage Examples

// Instantiated internally by clip.cpp for CogVLM models
clip_graph_cogvlm graph(ctx, img);
ggml_cgraph * gf = graph.build();
// Produces embeddings: [BOI, patch_1, ..., patch_N, EOI]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment