Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Mtmd GLM4V

From Leeroopedia
Knowledge Sources
Domains Multimodal, VisionEncoder
Last Updated 2025-02-15 00:00 GMT

Overview

Multimodal graph builder for the GLM-4V vision model, implementing dual convolution patch embedding, M-RoPE, and a patch merger projector.

Description

Implements clip_graph_glm4v::build() which constructs a ggml computation graph for the GLM-4V vision encoder. Uses dual conv2d patch embedding (two convolution layers summed), pixel unshuffling to merge spatial dimensions into the embedding dimension, patch bias addition, RMS normalization, bicubic-interpolated position embeddings (via resize_position_embeddings), and M-RoPE with 4-section rotary position encoding. The projector applies a conv2d-based patch merger for spatial downsampling, a fully-connected projection layer with LayerNorm and GELU-ERF activation, followed by an FFN (up/gate/down) block to produce language-model-compatible embeddings.

Usage

Automatically selected when the loaded CLIP model uses the PROJECTOR_TYPE_GLM4V projector type.

Code Reference

Source Location

  • Repository: Ollama
  • File: llama/llama.cpp/tools/mtmd/models/glm4v.cpp
  • Lines: 1-120

Signature

struct clip_graph_glm4v : public clip_graph {
    ggml_cgraph * build() override;
};

Import

#include "models.h"

I/O Contract

Inputs

Name Type Required Description
model clip_model & Yes Loaded GLM-4V CLIP model with weights
img clip_image_f32 & Yes Preprocessed float image (must have dimensions divisible by patch_size * 2)

Outputs

Name Type Description
ggml_cgraph * pointer Computation graph producing spatially-downsampled LLM embeddings

Usage Examples

// Instantiated internally by clip.cpp for GLM-4V models
clip_graph_glm4v graph(ctx, img);
ggml_cgraph * gf = graph.build();
// Output: [n_mmproj_embd, n_patches / merge^2] embeddings

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment