Principle:Ollama Ollama GGUF Model Conversion Gemma

Knowledge Sources	Ollama
Domains	Model Conversion, Gemma
Last Updated	2025-02-15 00:00 GMT

Overview

Gemma 1 model conversion transforms Google's Gemma architecture from HuggingFace SafeTensors to GGUF format, handling the GeGLU activation function, explicit head dimension specification, and a critical normalization weight adjustment where 1.0 must be added to all RMSNorm weights.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

model.embed_tokens -> token_embd
model.norm -> output_norm
model.layers -> blk
input_layernorm -> attn_norm
self_attn.q_proj -> attn_q
self_attn.k_proj -> attn_k
self_attn.v_proj -> attn_v
self_attn.o_proj -> attn_output
mlp.gate_proj -> ffn_gate
mlp.down_proj -> ffn_down
mlp.up_proj -> ffn_up
post_attention_layernorm -> ffn_norm

Architecture-Specific Hyperparameters

The GGUF metadata is written under the gemma.* namespace:

gemma.context_length -- maximum position embeddings
gemma.embedding_length -- hidden size
gemma.block_count -- number of hidden layers
gemma.feed_forward_length -- intermediate size
gemma.attention.head_count -- number of attention heads
gemma.attention.head_count_kv -- number of KV heads (GQA)
gemma.attention.layer_norm_rms_epsilon -- RMSNorm epsilon
gemma.attention.key_length / value_length -- explicit head dimension

Special Handling

Normalization Weight Offset

Gemma stores RMSNorm weights with a zero-centered convention. During conversion, 1.0 is added to all *_norm.weight tensors (excluding vision tensors prefixed with v.) using a repacker function. This transforms the weights from the Gemma convention (centered at 0) to the GGML convention (centered at 1).

Special Tokenizer Tokens

The converter sets specific token IDs for code infilling:

tokenizer.ggml.eot_token_id = 107
tokenizer.ggml.middle_token_id = 68
tokenizer.ggml.prefix_token_id = 67
tokenizer.ggml.suffix_token_id = 69

Explicit Head Dimension

Unlike many architectures that derive head dimension from hidden_size / num_heads, Gemma explicitly specifies head_dim in its config and writes it to both key_length and value_length.

Implementation Notes

The conversion is implemented in convert/convert_gemma.go via the gemmaModel struct. The addOne method uses the tensor library to element-wise add 1.0 to normalization weight tensors. This struct also serves as the base type for Gemma 3, which embeds it.

Related Pages

Implementation:Ollama_Ollama_Convert_Gemma

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment