Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert Qwen25Vl

From Leeroopedia
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Qwen 2.5 VL (Vision-Language) multimodal architecture, handling patch embedding splitting, combined QKV projection decomposition, and vision encoder windowed attention.

Description

The qwen25VLModel struct embeds qwen2Model and adds a vision encoder configuration with windowed attention, full-attention block indexes, spatial/temporal merge sizes, and RoPE parameters. The KV method reuses the Qwen2 text model KV generation (remapping qwen2.* to qwen25vl.*) and adds vision-specific metadata. The Tensors method performs two key transformations: splitting patch_embed.proj tensors into two patch embedding tensors (patch_embd_0, patch_embd_1) by dimension with singleton removal, and splitting combined attn.qkv tensors into separate Q, K, V tensors using splitDim.

Usage

Invoked automatically when the model's architecture matches Qwen2_5_VLForConditionalGeneration.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_qwen25vl.go
  • Lines: 1-102

Signature

type qwen25VLModel struct {
    qwen2Model
    VisionModel struct {
        Depth               uint32    `json:"depth"`
        HiddenSize          uint32    `json:"hidden_size"`
        WindowSize          uint32    `json:"window_size"`
        FullAttentionBlocks []int32   `json:"fullatt_block_indexes"`
        TemporalPatchSize   uint32    `json:"temporal_patch_size"`
    } `json:"vision_config"`
}

func (q *qwen25VLModel) KV(t *Tokenizer) KV
func (q *qwen25VLModel) Tensors(ts []Tensor) []*ggml.Tensor
func (p *qwen25VLModel) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
ts []Tensor Yes Source tensors including combined QKV and patch embed tensors

Outputs

Name Type Description
KV KV GGUF metadata with qwen25vl.* keys for text and vision config
[]*ggml.Tensor slice Converted tensors with split patch embeds and decomposed QKV

Usage Examples

// Converter registered for Qwen 2.5 VL
// patch_embed.proj -> patch_embd_0 + patch_embd_1 (split along dim 2)
// attn.qkv -> attn_q + attn_k + attn_v (split along dim 0)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment