Implementation:Ollama Ollama Convert Qwen25Vl
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the Qwen 2.5 VL (Vision-Language) multimodal architecture, handling patch embedding splitting, combined QKV projection decomposition, and vision encoder windowed attention.
Description
The qwen25VLModel struct embeds qwen2Model and adds a vision encoder configuration with windowed attention, full-attention block indexes, spatial/temporal merge sizes, and RoPE parameters. The KV method reuses the Qwen2 text model KV generation (remapping qwen2.* to qwen25vl.*) and adds vision-specific metadata. The Tensors method performs two key transformations: splitting patch_embed.proj tensors into two patch embedding tensors (patch_embd_0, patch_embd_1) by dimension with singleton removal, and splitting combined attn.qkv tensors into separate Q, K, V tensors using splitDim.
Usage
Invoked automatically when the model's architecture matches Qwen2_5_VLForConditionalGeneration.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_qwen25vl.go
- Lines: 1-102
Signature
type qwen25VLModel struct {
qwen2Model
VisionModel struct {
Depth uint32 `json:"depth"`
HiddenSize uint32 `json:"hidden_size"`
WindowSize uint32 `json:"window_size"`
FullAttentionBlocks []int32 `json:"fullatt_block_indexes"`
TemporalPatchSize uint32 `json:"temporal_patch_size"`
} `json:"vision_config"`
}
func (q *qwen25VLModel) KV(t *Tokenizer) KV
func (q *qwen25VLModel) Tensors(ts []Tensor) []*ggml.Tensor
func (p *qwen25VLModel) Replacements() []string
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| ts | []Tensor | Yes | Source tensors including combined QKV and patch embed tensors |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata with qwen25vl.* keys for text and vision config |
| []*ggml.Tensor | slice | Converted tensors with split patch embeds and decomposed QKV |
Usage Examples
// Converter registered for Qwen 2.5 VL
// patch_embed.proj -> patch_embd_0 + patch_embd_1 (split along dim 2)
// attn.qkv -> attn_q + attn_k + attn_v (split along dim 0)