Principle:Ollama Ollama GGUF Model Conversion Mistral Causal

Knowledge Sources	Ollama
Domains	Model Conversion, Mistral
Last Updated	2025-02-15 00:00 GMT

Overview

Mistral causal LM head variant conversion handles standalone Mistral text models (without vision components) exported with the MistralForCausalLM architecture class, sharing the same GGUF architecture identifier as the multimodal variant but with a simpler flat configuration structure and text-only tensor handling.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

model.norm -> output_norm
model. -> (stripped)
layers -> blk
embed_tokens -> token_embd
self_attn.{q,k,v}_proj -> attn_{q,k,v}
self_attn.o_proj -> attn_output
mlp.{down,gate,up}_proj -> ffn_{down,gate,up}
attention.{q,k,v}_proj -> attn_{q,k,v} (alternate naming)
feed_forward.{gate,down,up}_proj -> ffn_{gate,down,up} (alternate naming)
lm_head -> output

Architecture-Specific Hyperparameters

The GGUF metadata is written under the mistral3.* namespace (same as the multimodal variant):

mistral3.vocab_size -- vocabulary size
mistral3.block_count, context_length, embedding_length, feed_forward_length
mistral3.attention.head_count, head_count_kv, key_length, value_length
mistral3.rope.dimension_count -- head dimension
mistral3.rope.freq_base -- RoPE theta
mistral3.rope.scaling.* -- factor, type, beta_fast, beta_slow, mscale, mscale_all_dim
mistral3.rope.scaling_beta -- Llama 4-style scaling beta
mistral3.rope.scaling.original_context_length

Special Handling

Q/K Weight Repacking

Same interleaved-to-contiguous head permutation as the multimodal Mistral variant. Only applied to text tensors (non-vision prefixed), using the standard reshape-transpose-flatten pipeline.

Flat Configuration Structure

Unlike the multimodal variant which nests parameters under text_config, the causal variant reads all parameters from the top-level config. The rope_parameters sub-structure is nested directly within the model config.

Shared Architecture Identifier

Both the multimodal and causal variants use mistral3 as the GGUF architecture identifier, enabling the same GGML inference backend to handle both.

Optional Pointer Fields

The sliding_window, Mscale, MscaleAllDim, and Llama4ScalingBeta fields use Go pointer types to distinguish between absent and zero values in the HuggingFace config.

Implementation Notes

The conversion is implemented in convert/convert_mistral_causal.go via the mistral3CausalModel struct. This converter is selected when the HuggingFace architectures field contains MistralForCausalLM. The implementation is structurally similar to the multimodal variant but without vision-related fields and with a flat parameter layout.

Related Pages

Implementation:Ollama_Ollama_Convert_Mistral_Causal

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment