Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ollama Ollama GGUF Model Conversion Mistral Causal

From Leeroopedia
Knowledge Sources
Domains Model Conversion, Mistral
Last Updated 2025-02-15 00:00 GMT

Overview

Mistral causal LM head variant conversion handles standalone Mistral text models (without vision components) exported with the MistralForCausalLM architecture class, sharing the same GGUF architecture identifier as the multimodal variant but with a simpler flat configuration structure and text-only tensor handling.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

  • model.norm -> output_norm
  • model. -> (stripped)
  • layers -> blk
  • embed_tokens -> token_embd
  • self_attn.{q,k,v}_proj -> attn_{q,k,v}
  • self_attn.o_proj -> attn_output
  • mlp.{down,gate,up}_proj -> ffn_{down,gate,up}
  • attention.{q,k,v}_proj -> attn_{q,k,v} (alternate naming)
  • feed_forward.{gate,down,up}_proj -> ffn_{gate,down,up} (alternate naming)
  • lm_head -> output

Architecture-Specific Hyperparameters

The GGUF metadata is written under the mistral3.* namespace (same as the multimodal variant):

  • mistral3.vocab_size -- vocabulary size
  • mistral3.block_count, context_length, embedding_length, feed_forward_length
  • mistral3.attention.head_count, head_count_kv, key_length, value_length
  • mistral3.rope.dimension_count -- head dimension
  • mistral3.rope.freq_base -- RoPE theta
  • mistral3.rope.scaling.* -- factor, type, beta_fast, beta_slow, mscale, mscale_all_dim
  • mistral3.rope.scaling_beta -- Llama 4-style scaling beta
  • mistral3.rope.scaling.original_context_length

Special Handling

Q/K Weight Repacking

Same interleaved-to-contiguous head permutation as the multimodal Mistral variant. Only applied to text tensors (non-vision prefixed), using the standard reshape-transpose-flatten pipeline.

Flat Configuration Structure

Unlike the multimodal variant which nests parameters under text_config, the causal variant reads all parameters from the top-level config. The rope_parameters sub-structure is nested directly within the model config.

Shared Architecture Identifier

Both the multimodal and causal variants use mistral3 as the GGUF architecture identifier, enabling the same GGML inference backend to handle both.

Optional Pointer Fields

The sliding_window, Mscale, MscaleAllDim, and Llama4ScalingBeta fields use Go pointer types to distinguish between absent and zero values in the HuggingFace config.

Implementation Notes

The conversion is implemented in convert/convert_mistral_causal.go via the mistral3CausalModel struct. This converter is selected when the HuggingFace architectures field contains MistralForCausalLM. The implementation is structurally similar to the multimodal variant but without vision-related fields and with a flat parameter layout.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment