Implementation:Ollama Ollama Convert MLLama

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Meta MLlama (Llama 3.2 Vision) multimodal architecture, handling cross-attention layers, gated positional embeddings, and tanh-based gate tensor repacking.

Description

The mllamaModel struct embeds llamaModel for the text model and adds cross-attention layer indices and a full vision encoder configuration. The KV method reuses the embedded llama KV generation (remapping llama.* to mllama.*) and adds vision-specific metadata including block counts, global layer counts, intermediate layer indices, image/patch/tile sizes, and norm epsilon. The Tensors method handles three special cases: splitting v.position_embd.gate into position and tile gates (with 1-tanh and tanh transforms respectively), repacking vision Q/K weights with interleaved head reordering, and applying tanh transforms to pre/post tile position embedding gates. Text tensors are delegated to the embedded llamaModel.Tensors.

Usage

Invoked automatically when the model's architecture matches MllamaForConditionalGeneration.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_mllama.go
Lines: 1-179

Signature

type mllamaModel struct {
    ModelParameters
    TextModel struct {
        llamaModel
        CrossAttentionLayers []int32 `json:"cross_attention_layers"`
    } `json:"text_config"`
    VisionModel struct {
        NumHiddenLayers           uint32  `json:"num_hidden_layers"`
        NumGlobalLayers           uint32  `json:"num_global_layers"`
        IntermediateLayersIndices []int32 `json:"intermediate_layers_indices"`
        ImageSize                 uint32  `json:"image_size"`
        MaxNumTiles               uint32  `json:"max_num_tiles"`
    } `json:"vision_config"`
}

func (m *mllamaModel) KV(t *Tokenizer) KV
func (m *mllamaModel) Replacements() []string
func (m *mllamaModel) Tensors(ts []Tensor) []*ggml.Tensor
func (m *mllamaModel) repack(name string) Repacker

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
ts	[]Tensor	Yes	Source tensors from text model, vision encoder, and cross-attention layers

Outputs

Name	Type	Description
KV	KV	GGUF metadata with mllama.* keys for text, vision, and cross-attention config
[]*ggml.Tensor	slice	Converted tensors with tanh-transformed gates and repacked Q/K weights

Usage Examples

// Converter registered for MllamaForConditionalGeneration
// v.position_embd.gate is split into position_embd.gate (1-tanh) and tile_position_embd.gate (tanh)
// Vision Q/K weights get interleaved head reordering

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_MLLama

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment