Principle:Ollama Ollama GGUF Model Conversion Mistral Causal
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, Mistral |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Mistral causal LM head variant conversion handles standalone Mistral text models (without vision components) exported with the MistralForCausalLM architecture class, sharing the same GGUF architecture identifier as the multimodal variant but with a simpler flat configuration structure and text-only tensor handling.
Core Concepts
Tensor Name Mapping
The converter applies the following HuggingFace-to-GGUF tensor name replacements:
model.norm->output_normmodel.-> (stripped)layers->blkembed_tokens->token_embdself_attn.{q,k,v}_proj->attn_{q,k,v}self_attn.o_proj->attn_outputmlp.{down,gate,up}_proj->ffn_{down,gate,up}attention.{q,k,v}_proj->attn_{q,k,v}(alternate naming)feed_forward.{gate,down,up}_proj->ffn_{gate,down,up}(alternate naming)lm_head->output
Architecture-Specific Hyperparameters
The GGUF metadata is written under the mistral3.* namespace (same as the multimodal variant):
mistral3.vocab_size-- vocabulary sizemistral3.block_count,context_length,embedding_length,feed_forward_lengthmistral3.attention.head_count,head_count_kv,key_length,value_lengthmistral3.rope.dimension_count-- head dimensionmistral3.rope.freq_base-- RoPE thetamistral3.rope.scaling.*-- factor, type, beta_fast, beta_slow, mscale, mscale_all_dimmistral3.rope.scaling_beta-- Llama 4-style scaling betamistral3.rope.scaling.original_context_length
Special Handling
Q/K Weight Repacking
Same interleaved-to-contiguous head permutation as the multimodal Mistral variant. Only applied to text tensors (non-vision prefixed), using the standard reshape-transpose-flatten pipeline.
Flat Configuration Structure
Unlike the multimodal variant which nests parameters under text_config, the causal variant reads all parameters from the top-level config. The rope_parameters sub-structure is nested directly within the model config.
Both the multimodal and causal variants use mistral3 as the GGUF architecture identifier, enabling the same GGML inference backend to handle both.
Optional Pointer Fields
The sliding_window, Mscale, MscaleAllDim, and Llama4ScalingBeta fields use Go pointer types to distinguish between absent and zero values in the HuggingFace config.
Implementation Notes
The conversion is implemented in convert/convert_mistral_causal.go via the mistral3CausalModel struct. This converter is selected when the HuggingFace architectures field contains MistralForCausalLM. The implementation is structurally similar to the multimodal variant but without vision-related fields and with a flat parameter layout.