Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Ollama Ollama GGUF Model Conversion Mistral

From Leeroopedia
Knowledge Sources
Domains Model Conversion, Mistral
Last Updated 2025-02-15 00:00 GMT

Overview

Mistral conversion (Mistral 3 multimodal) handles the Mistral architecture including sliding window attention, vision encoder integration, multimodal projector, and advanced RoPE scaling configurations (YaRN, mscale, llama4_scaling_beta), transforming the complete vision-language model from HuggingFace SafeTensors to GGUF format.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

  • language_model.model.norm -> output_norm
  • language_model.model. / language_model. -> (stripped)
  • layers -> blk
  • vision_tower -> v
  • ln_pre -> encoder_norm
  • embed_tokens -> token_embd
  • self_attn.{q,k,v}_proj -> attn_{q,k,v}
  • self_attn.o_proj -> attn_output
  • attention.{q,k,v}_proj -> attn_{q,k,v} (alternate naming)
  • feed_forward.{gate,down,up}_proj -> ffn_{gate,down,up} (alternate naming)
  • multi_modal_projector -> mm
  • lm_head -> output

Architecture-Specific Hyperparameters

The GGUF metadata is written under the mistral3.* namespace:

Text:

  • mistral3.vocab_size -- vocabulary size
  • mistral3.block_count, context_length, embedding_length, feed_forward_length
  • mistral3.attention.head_count, head_count_kv, key_length, value_length
  • mistral3.rope.dimension_count -- head dimension (or hidden_size / num_heads)
  • mistral3.rope.freq_base -- RoPE theta
  • mistral3.rope.scaling.* -- factor, type, beta_fast, beta_slow, mscale, mscale_all_dim, original_context_length
  • mistral3.rope.scaling_beta -- Llama 4-style scaling beta

Vision:

  • mistral3.vision.block_count, embedding_length, feed_forward_length
  • mistral3.vision.attention.head_count, key_length
  • mistral3.vision.image_size, patch_size, num_channels
  • mistral3.vision.rope.freq_base -- separate RoPE theta for vision

Multimodal:

  • mistral3.image_token_index, spatial_merge_size
  • mistral3.mm.projector_bias, projector_hidden_act

Special Handling

Q/K Weight Repacking

Text model Q and K weight tensors (not vision tensors) are repacked from interleaved to contiguous head layout using the standard Llama-style permutation: reshape to [heads, 2, head_dim/2, hidden], transpose to [heads, head_dim/2, 2, hidden], then flatten.

Dual RoPE Configuration

The model supports separate RoPE parameters for text and vision encoders. The text model's RoPE can use various scaling types including YaRN, with optional mscale and mscale_all_dim parameters that only appear as pointers (may be absent).

Nested Config Structure

Parameters are organized under text_config and vision_config with a separate rope_parameters sub-structure containing the RoPE scaling configuration.

Implementation Notes

The conversion is implemented in convert/convert_mistral.go via the mistral3Model struct. The struct handles multiple naming conventions for tensor replacements (both self_attn and attention prefixes, both mlp and feed_forward prefixes) to support different checkpoint formats from Mistral AI.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment