Principle:Ollama Ollama GGUF Model Conversion Gemma3n
| Knowledge Sources | |
|---|---|
| Domains | Model Conversion, Multimodal |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Gemma 3n conversion handles Google's Gemma 3 Nano architecture with audio/vision extensions and a novel AltUp (Alternating Updates) mechanism, transforming the model from HuggingFace SafeTensors to GGUF format while computing activation sparsity quantiles and merging AltUp projection tensors.
Core Concepts
Tensor Name Mapping
The converter applies the following HuggingFace-to-GGUF tensor name replacements:
model.language_model.embed_tokens_per_layer->per_layer_token_embdmodel.language_model.embed_tokens->token_embdmodel.language_model.per_layer_model_projection->per_layer_model_projmodel.language_model.per_layer_projection_norm->per_layer_proj_normmodel.language_model.altup_projections->altup_projmodel.language_model.altup_unembed_projections->altup_unembd_projmodel.language_model.norm->output_normmodel.language_model.layers->blkper_layer_input_gate->inp_gateper_layer_projection->projaltup.prediction_coefs->altup_predict_coefaltup.correction_coefs->altup_correct_coefaltup.correct_output_scale->correct_scale.weightlaurel.linear_left->laurel_llaurel.linear_right->laurel_r
Architecture-Specific Hyperparameters
The GGUF metadata is written under the gemma3n.* namespace:
gemma3n.activation_sparsity_scale-- per-layer quantile values computed from normal distribution CDFgemma3n.altup.active_idx-- active AltUp input indexgemma3n.altup.correct_scale-- correction scaling flaggemma3n.altup.lr_multiplier-- learning rate multipliergemma3n.altup.num_inputs-- number of AltUp input streamsgemma3n.attention.shared_kv_layers-- number of shared KV layersgemma3n.attention.sliding_window_pattern-- per-layer local/global attention boolean arraygemma3n.embedding_length_per_layer_input-- per-layer input hidden sizegemma3n.head_dim-- explicit head dimensiongemma3n.rope.freq_base_local/freq_base-- separate RoPE bases for local and global attention
Special Handling
Activation Sparsity Quantile Computation
The activation_sparsity_pattern from the config contains probability values that are converted to quantile values using the inverse CDF (percent-point function) of the standard normal distribution. This is computed at conversion time using the Gonum statistics library.
AltUp Projection Merging
Individual AltUp projection tensors (pattern: altup_proj.*.weight and altup_unembd_proj.*.weight) are merged into single stacked tensors using mergeTensors.
Coefficient Clamping
The altup_predict_coef and altup_correct_coef tensors are clamped to [-altup_coef_clip, +altup_coef_clip] using a repacker function when the clip value is configured.
Audio and Vision Tower Exclusion
Tensors from audio_tower, embed_audio, vision_tower, and embed_vision are currently skipped during conversion (marked as TODO in the codebase).
Implementation Notes
The conversion is implemented in convert/convert_gemma3n.go via the gemma3nModel struct. It uses the gonum.org/v1/gonum/stat/distuv package to compute normal distribution quantiles for activation sparsity. The Laurel (Low-Rank Update) layers use distinct naming with left/right linear projections.