Implementation:Ollama Ollama Convert Mistral
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, GGUF Format |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the GGUF model converter for the Mistral 3 multimodal (conditional generation) architecture, handling text model, vision encoder, multimodal projector, and advanced RoPE scaling parameters.
Description
The mistral3Model struct implements ModelConverter with KV metadata for text configuration (including advanced RoPE parameters: mscale, mscale_all_dim, beta_fast/slow, llama4_scaling_beta, YaRN-style scaling), vision configuration (with per-head dim, RoPE theta, patch/image sizes, num channels), and multimodal configuration (image token index, spatial merge size, projector bias and hidden act). The Tensors method applies Q/K weight repacking (interleaved head reordering) to non-vision attention tensors. The Replacements method handles the language_model.model.* namespace prefix stripping and maps vision/multimodal projector paths.
Usage
Invoked automatically when the model's architecture matches Mistral3ForConditionalGeneration.
Code Reference
Source Location
- Repository: Ollama
- File: convert/convert_mistral.go
- Lines: 1-221
Signature
type mistral3Model struct {
ModelParameters
ImageTokenIndex uint32 `json:"image_token_index"`
SpatialMergeSize uint32 `json:"spatial_merge_size"`
TextModel struct {
NumHiddenLayers uint32 `json:"num_hidden_layers"`
HiddenSize uint32 `json:"hidden_size"`
NumAttentionHeads uint32 `json:"num_attention_heads"`
RopeParameters struct { ... } `json:"rope_parameters"`
} `json:"text_config"`
VisionModel struct { ... } `json:"vision_config"`
}
func (p *mistral3Model) KV(t *Tokenizer) KV
func (p *mistral3Model) Tensors(ts []Tensor) []*ggml.Tensor
func (p *mistral3Model) Replacements() []string
func (p *mistral3Model) repack(name string, data []float32, shape []uint64) ([]float32, error)
Import
import "github.com/ollama/ollama/convert"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| t | *Tokenizer | Yes | Tokenizer data for GGUF metadata |
| ts | []Tensor | Yes | Source tensors from text model, vision encoder, and multimodal projector |
Outputs
| Name | Type | Description |
|---|---|---|
| KV | KV | GGUF metadata with mistral3.* keys for text, vision, and multimodal config |
| []*ggml.Tensor | slice | Converted tensors with repacked Q/K attention weights |
Usage Examples
// Converter registered for Mistral3ForConditionalGeneration
// Q/K weights in non-vision layers are repacked with interleaved head reordering
// Vision tensors pass through unchanged