Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Convert MLLama

From Leeroopedia
Knowledge Sources
Domains Model Conversion, GGUF Format
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Meta MLlama (Llama 3.2 Vision) multimodal architecture, handling cross-attention layers, gated positional embeddings, and tanh-based gate tensor repacking.

Description

The mllamaModel struct embeds llamaModel for the text model and adds cross-attention layer indices and a full vision encoder configuration. The KV method reuses the embedded llama KV generation (remapping llama.* to mllama.*) and adds vision-specific metadata including block counts, global layer counts, intermediate layer indices, image/patch/tile sizes, and norm epsilon. The Tensors method handles three special cases: splitting v.position_embd.gate into position and tile gates (with 1-tanh and tanh transforms respectively), repacking vision Q/K weights with interleaved head reordering, and applying tanh transforms to pre/post tile position embedding gates. Text tensors are delegated to the embedded llamaModel.Tensors.

Usage

Invoked automatically when the model's architecture matches MllamaForConditionalGeneration.

Code Reference

Source Location

  • Repository: Ollama
  • File: convert/convert_mllama.go
  • Lines: 1-179

Signature

type mllamaModel struct {
    ModelParameters
    TextModel struct {
        llamaModel
        CrossAttentionLayers []int32 `json:"cross_attention_layers"`
    } `json:"text_config"`
    VisionModel struct {
        NumHiddenLayers           uint32  `json:"num_hidden_layers"`
        NumGlobalLayers           uint32  `json:"num_global_layers"`
        IntermediateLayersIndices []int32 `json:"intermediate_layers_indices"`
        ImageSize                 uint32  `json:"image_size"`
        MaxNumTiles               uint32  `json:"max_num_tiles"`
    } `json:"vision_config"`
}

func (m *mllamaModel) KV(t *Tokenizer) KV
func (m *mllamaModel) Replacements() []string
func (m *mllamaModel) Tensors(ts []Tensor) []*ggml.Tensor
func (m *mllamaModel) repack(name string) Repacker

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name Type Required Description
t *Tokenizer Yes Tokenizer data for GGUF metadata
ts []Tensor Yes Source tensors from text model, vision encoder, and cross-attention layers

Outputs

Name Type Description
KV KV GGUF metadata with mllama.* keys for text, vision, and cross-attention config
[]*ggml.Tensor slice Converted tensors with tanh-transformed gates and repacked Q/K weights

Usage Examples

// Converter registered for MllamaForConditionalGeneration
// v.position_embd.gate is split into position_embd.gate (1-tanh) and tile_position_embd.gate (tanh)
// Vision Q/K weights get interleaved head reordering

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment