Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Ollama Ollama GGUF Model Conversion Gemma3n

From Leeroopedia
Knowledge Sources
Domains Model Conversion, Multimodal
Last Updated 2025-02-15 00:00 GMT

Overview

Gemma 3n conversion handles Google's Gemma 3 Nano architecture with audio/vision extensions and a novel AltUp (Alternating Updates) mechanism, transforming the model from HuggingFace SafeTensors to GGUF format while computing activation sparsity quantiles and merging AltUp projection tensors.

Core Concepts

Tensor Name Mapping

The converter applies the following HuggingFace-to-GGUF tensor name replacements:

  • model.language_model.embed_tokens_per_layer -> per_layer_token_embd
  • model.language_model.embed_tokens -> token_embd
  • model.language_model.per_layer_model_projection -> per_layer_model_proj
  • model.language_model.per_layer_projection_norm -> per_layer_proj_norm
  • model.language_model.altup_projections -> altup_proj
  • model.language_model.altup_unembed_projections -> altup_unembd_proj
  • model.language_model.norm -> output_norm
  • model.language_model.layers -> blk
  • per_layer_input_gate -> inp_gate
  • per_layer_projection -> proj
  • altup.prediction_coefs -> altup_predict_coef
  • altup.correction_coefs -> altup_correct_coef
  • altup.correct_output_scale -> correct_scale.weight
  • laurel.linear_left -> laurel_l
  • laurel.linear_right -> laurel_r

Architecture-Specific Hyperparameters

The GGUF metadata is written under the gemma3n.* namespace:

  • gemma3n.activation_sparsity_scale -- per-layer quantile values computed from normal distribution CDF
  • gemma3n.altup.active_idx -- active AltUp input index
  • gemma3n.altup.correct_scale -- correction scaling flag
  • gemma3n.altup.lr_multiplier -- learning rate multiplier
  • gemma3n.altup.num_inputs -- number of AltUp input streams
  • gemma3n.attention.shared_kv_layers -- number of shared KV layers
  • gemma3n.attention.sliding_window_pattern -- per-layer local/global attention boolean array
  • gemma3n.embedding_length_per_layer_input -- per-layer input hidden size
  • gemma3n.head_dim -- explicit head dimension
  • gemma3n.rope.freq_base_local / freq_base -- separate RoPE bases for local and global attention

Special Handling

Activation Sparsity Quantile Computation

The activation_sparsity_pattern from the config contains probability values that are converted to quantile values using the inverse CDF (percent-point function) of the standard normal distribution. This is computed at conversion time using the Gonum statistics library.

AltUp Projection Merging

Individual AltUp projection tensors (pattern: altup_proj.*.weight and altup_unembd_proj.*.weight) are merged into single stacked tensors using mergeTensors.

Coefficient Clamping

The altup_predict_coef and altup_correct_coef tensors are clamped to [-altup_coef_clip, +altup_coef_clip] using a repacker function when the clip value is configured.

Audio and Vision Tower Exclusion

Tensors from audio_tower, embed_audio, vision_tower, and embed_vision are currently skipped during conversion (marked as TODO in the codebase).

Implementation Notes

The conversion is implemented in convert/convert_gemma3n.go via the gemma3nModel struct. It uses the gonum.org/v1/gonum/stat/distuv package to compute normal distribution quantiles for activation sparsity. The Laurel (Low-Rank Update) layers use distinct naming with left/right linear projections.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment