Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ollama Ollama GGUF Model Conversion Gemma

From Leeroopedia
Knowledge Sources
Domains Model Conversion, Gemma
Last Updated 2025-02-15 00:00 GMT

Overview

Gemma 1 model conversion transforms Google's Gemma architecture from HuggingFace SafeTensors to GGUF format, handling the GeGLU activation function, explicit head dimension specification, and a critical normalization weight adjustment where 1.0 must be added to all RMSNorm weights.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

  • model.embed_tokens -> token_embd
  • model.norm -> output_norm
  • model.layers -> blk
  • input_layernorm -> attn_norm
  • self_attn.q_proj -> attn_q
  • self_attn.k_proj -> attn_k
  • self_attn.v_proj -> attn_v
  • self_attn.o_proj -> attn_output
  • mlp.gate_proj -> ffn_gate
  • mlp.down_proj -> ffn_down
  • mlp.up_proj -> ffn_up
  • post_attention_layernorm -> ffn_norm

Architecture-Specific Hyperparameters

The GGUF metadata is written under the gemma.* namespace:

  • gemma.context_length -- maximum position embeddings
  • gemma.embedding_length -- hidden size
  • gemma.block_count -- number of hidden layers
  • gemma.feed_forward_length -- intermediate size
  • gemma.attention.head_count -- number of attention heads
  • gemma.attention.head_count_kv -- number of KV heads (GQA)
  • gemma.attention.layer_norm_rms_epsilon -- RMSNorm epsilon
  • gemma.attention.key_length / value_length -- explicit head dimension

Special Handling

Normalization Weight Offset

Gemma stores RMSNorm weights with a zero-centered convention. During conversion, 1.0 is added to all *_norm.weight tensors (excluding vision tensors prefixed with v.) using a repacker function. This transforms the weights from the Gemma convention (centered at 0) to the GGML convention (centered at 1).

Special Tokenizer Tokens

The converter sets specific token IDs for code infilling:

  • tokenizer.ggml.eot_token_id = 107
  • tokenizer.ggml.middle_token_id = 68
  • tokenizer.ggml.prefix_token_id = 67
  • tokenizer.ggml.suffix_token_id = 69

Explicit Head Dimension

Unlike many architectures that derive head dimension from hidden_size / num_heads, Gemma explicitly specifies head_dim in its config and writes it to both key_length and value_length.

Implementation Notes

The conversion is implemented in convert/convert_gemma.go via the gemmaModel struct. The addOne method uses the tensor library to element-wise add 1.0 to normalization weight tensors. This struct also serves as the base type for Gemma 3, which embeds it.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment