Implementation:Ollama Ollama Convert Qwen25Vl

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Qwen 2.5 VL (Vision-Language) multimodal architecture, handling patch embedding splitting, combined QKV projection decomposition, and vision encoder windowed attention.

Description

The qwen25VLModel struct embeds qwen2Model and adds a vision encoder configuration with windowed attention, full-attention block indexes, spatial/temporal merge sizes, and RoPE parameters. The KV method reuses the Qwen2 text model KV generation (remapping qwen2.* to qwen25vl.*) and adds vision-specific metadata. The Tensors method performs two key transformations: splitting patch_embed.proj tensors into two patch embedding tensors (patch_embd_0, patch_embd_1) by dimension with singleton removal, and splitting combined attn.qkv tensors into separate Q, K, V tensors using splitDim.

Usage

Invoked automatically when the model's architecture matches Qwen2_5_VLForConditionalGeneration.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_qwen25vl.go
Lines: 1-102

Signature

type qwen25VLModel struct {
    qwen2Model
    VisionModel struct {
        Depth               uint32    `json:"depth"`
        HiddenSize          uint32    `json:"hidden_size"`
        WindowSize          uint32    `json:"window_size"`
        FullAttentionBlocks []int32   `json:"fullatt_block_indexes"`
        TemporalPatchSize   uint32    `json:"temporal_patch_size"`
    } `json:"vision_config"`
}

func (q *qwen25VLModel) KV(t *Tokenizer) KV
func (q *qwen25VLModel) Tensors(ts []Tensor) []*ggml.Tensor
func (p *qwen25VLModel) Replacements() []string

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
ts	[]Tensor	Yes	Source tensors including combined QKV and patch embed tensors

Outputs

Name	Type	Description
KV	KV	GGUF metadata with qwen25vl.* keys for text and vision config
[]*ggml.Tensor	slice	Converted tensors with split patch embeds and decomposed QKV

Usage Examples

// Converter registered for Qwen 2.5 VL
// patch_embed.proj -> patch_embd_0 + patch_embd_1 (split along dim 2)
// attn.qkv -> attn_q + attn_k + attn_v (split along dim 0)

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_Qwen25Vl

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment