Principle:Ollama Ollama GGUF Model Conversion Gemma3n

Knowledge Sources	Ollama
Domains	Model Conversion, Multimodal
Last Updated	2025-02-15 00:00 GMT

Overview

Gemma 3n conversion handles Google's Gemma 3 Nano architecture with audio/vision extensions and a novel AltUp (Alternating Updates) mechanism, transforming the model from HuggingFace SafeTensors to GGUF format while computing activation sparsity quantiles and merging AltUp projection tensors.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

model.language_model.embed_tokens_per_layer -> per_layer_token_embd
model.language_model.embed_tokens -> token_embd
model.language_model.per_layer_model_projection -> per_layer_model_proj
model.language_model.per_layer_projection_norm -> per_layer_proj_norm
model.language_model.altup_projections -> altup_proj
model.language_model.altup_unembed_projections -> altup_unembd_proj
model.language_model.norm -> output_norm
model.language_model.layers -> blk
per_layer_input_gate -> inp_gate
per_layer_projection -> proj
altup.prediction_coefs -> altup_predict_coef
altup.correction_coefs -> altup_correct_coef
altup.correct_output_scale -> correct_scale.weight
laurel.linear_left -> laurel_l
laurel.linear_right -> laurel_r

Architecture-Specific Hyperparameters

The GGUF metadata is written under the gemma3n.* namespace:

gemma3n.activation_sparsity_scale -- per-layer quantile values computed from normal distribution CDF
gemma3n.altup.active_idx -- active AltUp input index
gemma3n.altup.correct_scale -- correction scaling flag
gemma3n.altup.lr_multiplier -- learning rate multiplier
gemma3n.altup.num_inputs -- number of AltUp input streams
gemma3n.attention.shared_kv_layers -- number of shared KV layers
gemma3n.attention.sliding_window_pattern -- per-layer local/global attention boolean array
gemma3n.embedding_length_per_layer_input -- per-layer input hidden size
gemma3n.head_dim -- explicit head dimension
gemma3n.rope.freq_base_local / freq_base -- separate RoPE bases for local and global attention

Special Handling

Activation Sparsity Quantile Computation

The activation_sparsity_pattern from the config contains probability values that are converted to quantile values using the inverse CDF (percent-point function) of the standard normal distribution. This is computed at conversion time using the Gonum statistics library.

AltUp Projection Merging

Individual AltUp projection tensors (pattern: altup_proj.*.weight and altup_unembd_proj.*.weight) are merged into single stacked tensors using mergeTensors.

Coefficient Clamping

The altup_predict_coef and altup_correct_coef tensors are clamped to [-altup_coef_clip, +altup_coef_clip] using a repacker function when the clip value is configured.

Audio and Vision Tower Exclusion

Tensors from audio_tower, embed_audio, vision_tower, and embed_vision are currently skipped during conversion (marked as TODO in the codebase).

Implementation Notes

The conversion is implemented in convert/convert_gemma3n.go via the gemma3nModel struct. It uses the gonum.org/v1/gonum/stat/distuv package to compute normal distribution quantiles for activation sparsity. The Laurel (Low-Rank Update) layers use distinct naming with left/right linear projections.

Related Pages

Implementation:Ollama_Ollama_Convert_Gemma3n

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment