Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Imagegen Gemma3

From Leeroopedia
Knowledge Sources
Domains Image Generation, LLM Inference
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the Gemma 3 text and multimodal model architecture for MLX-based inference.

Description

The gemma3.go file implements the full Gemma 3 text model with Q/K normalization, sliding window attention, and Gemma-style RMSNorm (1 + weight). It defines TextConfig with parameters for hidden size, attention heads, GQA key-value heads, and sliding window pattern. The TextModel struct includes embedding, decoder layers with pre/post attention and feed-forward norms, and tied output embeddings. The Attention struct supports both global and sliding window attention based on layer index, with precomputed norm scaling factors for efficiency. The model loads from safetensors with companion tokenizer and config files.

Usage

Used for text generation with Gemma 3 models in the standalone MLX engine, supporting both text-only and multimodal (with SigLIP vision tower) configurations.

Code Reference

Source Location

  • Repository: Ollama
  • File: x/imagegen/models/gemma3/gemma3.go
  • Lines: 1-614

Signature

type TextConfig struct {
	HiddenSize            int32   `json:"hidden_size"`
	NumHiddenLayers       int32   `json:"num_hidden_layers"`
	NumAttentionHeads     int32   `json:"num_attention_heads"`
	NumKeyValueHeads      int32   `json:"num_key_value_heads"`
	HeadDim               int32   `json:"head_dim"`
	SlidingWindow         int32   `json:"sliding_window"`
	SlidingWindowPattern  int32   `json:"sliding_window_pattern"`
}

type TextModel struct {
	EmbedTokens *nn.Embedding   `weight:"model.embed_tokens"`
	Layers      []*DecoderLayer `weight:"model.layers"`
	Norm        *nn.RMSNorm     `weight:"model.norm"`
	Output      *nn.Linear      `weight:"-"` // Tied to EmbedTokens
}

func LoadText(modelPath string) (*TextModel, error)
func (m *TextModel) Forward(tokens *mlx.Array, caches []cache.Cache) *mlx.Array

Import

import "github.com/ollama/ollama/x/imagegen/models/gemma3"

I/O Contract

Inputs

Name Type Required Description
modelPath string Yes Directory path containing model files
tokens *mlx.Array Yes Input token IDs [B, L]
caches []cache.Cache Yes KV caches for each layer

Outputs

Name Type Description
*TextModel *TextModel Loaded model ready for inference
*mlx.Array *mlx.Array Logits [B, L, vocab_size]

Usage Examples

model, err := gemma3.LoadText("/path/to/gemma3")
if err != nil {
    return err
}

caches := model.NewCache(0)
logits := model.Forward(tokens, caches)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment