Implementation:Ollama Ollama Convert Mistral

Knowledge Sources	Ollama
Domains	Model Conversion, GGUF Format
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the GGUF model converter for the Mistral 3 multimodal (conditional generation) architecture, handling text model, vision encoder, multimodal projector, and advanced RoPE scaling parameters.

Description

The mistral3Model struct implements ModelConverter with KV metadata for text configuration (including advanced RoPE parameters: mscale, mscale_all_dim, beta_fast/slow, llama4_scaling_beta, YaRN-style scaling), vision configuration (with per-head dim, RoPE theta, patch/image sizes, num channels), and multimodal configuration (image token index, spatial merge size, projector bias and hidden act). The Tensors method applies Q/K weight repacking (interleaved head reordering) to non-vision attention tensors. The Replacements method handles the language_model.model.* namespace prefix stripping and maps vision/multimodal projector paths.

Usage

Invoked automatically when the model's architecture matches Mistral3ForConditionalGeneration.

Code Reference

Source Location

Repository: Ollama
File: convert/convert_mistral.go
Lines: 1-221

Signature

type mistral3Model struct {
    ModelParameters
    ImageTokenIndex    uint32 `json:"image_token_index"`
    SpatialMergeSize   uint32 `json:"spatial_merge_size"`
    TextModel          struct {
        NumHiddenLayers   uint32  `json:"num_hidden_layers"`
        HiddenSize        uint32  `json:"hidden_size"`
        NumAttentionHeads uint32  `json:"num_attention_heads"`
        RopeParameters    struct { ... } `json:"rope_parameters"`
    } `json:"text_config"`
    VisionModel struct { ... } `json:"vision_config"`
}

func (p *mistral3Model) KV(t *Tokenizer) KV
func (p *mistral3Model) Tensors(ts []Tensor) []*ggml.Tensor
func (p *mistral3Model) Replacements() []string
func (p *mistral3Model) repack(name string, data []float32, shape []uint64) ([]float32, error)

Import

import "github.com/ollama/ollama/convert"

I/O Contract

Inputs

Name	Type	Required	Description
t	*Tokenizer	Yes	Tokenizer data for GGUF metadata
ts	[]Tensor	Yes	Source tensors from text model, vision encoder, and multimodal projector

Outputs

Name	Type	Description
KV	KV	GGUF metadata with mistral3.* keys for text, vision, and multimodal config
[]*ggml.Tensor	slice	Converted tensors with repacked Q/K attention weights

Usage Examples

// Converter registered for Mistral3ForConditionalGeneration
// Q/K weights in non-vision layers are repacked with interleaved head reordering
// Vision tensors pass through unchanged

Related Pages

Principle:Ollama_Ollama_GGUF_Model_Conversion_Mistral

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment