Principle:Ollama Ollama GGUF Model Conversion Gemma
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, Gemma |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Gemma 1 model conversion transforms Google's Gemma architecture from HuggingFace SafeTensors to GGUF format, handling the GeGLU activation function, explicit head dimension specification, and a critical normalization weight adjustment where 1.0 must be added to all RMSNorm weights.
Core Concepts
Tensor Name Mapping
The converter applies the following HuggingFace-to-GGUF tensor name replacements:
model.embed_tokens->token_embdmodel.norm->output_normmodel.layers->blkinput_layernorm->attn_normself_attn.q_proj->attn_qself_attn.k_proj->attn_kself_attn.v_proj->attn_vself_attn.o_proj->attn_outputmlp.gate_proj->ffn_gatemlp.down_proj->ffn_downmlp.up_proj->ffn_uppost_attention_layernorm->ffn_norm
Architecture-Specific Hyperparameters
The GGUF metadata is written under the gemma.* namespace:
gemma.context_length-- maximum position embeddingsgemma.embedding_length-- hidden sizegemma.block_count-- number of hidden layersgemma.feed_forward_length-- intermediate sizegemma.attention.head_count-- number of attention headsgemma.attention.head_count_kv-- number of KV heads (GQA)gemma.attention.layer_norm_rms_epsilon-- RMSNorm epsilongemma.attention.key_length/value_length-- explicit head dimension
Special Handling
Normalization Weight Offset
Gemma stores RMSNorm weights with a zero-centered convention. During conversion, 1.0 is added to all *_norm.weight tensors (excluding vision tensors prefixed with v.) using a repacker function. This transforms the weights from the Gemma convention (centered at 0) to the GGML convention (centered at 1).
Special Tokenizer Tokens
The converter sets specific token IDs for code infilling:
tokenizer.ggml.eot_token_id= 107tokenizer.ggml.middle_token_id= 68tokenizer.ggml.prefix_token_id= 67tokenizer.ggml.suffix_token_id= 69
Explicit Head Dimension
Unlike many architectures that derive head dimension from hidden_size / num_heads, Gemma explicitly specifies head_dim in its config and writes it to both key_length and value_length.
Implementation Notes
The conversion is implemented in convert/convert_gemma.go via the gemmaModel struct. The addOne method uses the tensor library to element-wise add 1.0 to normalization weight tensors. This struct also serves as the base type for Gemma 3, which embeds it.