Implementation:Ollama Ollama Imagegen Gemma3

Knowledge Sources	Ollama
Domains	Image Generation, LLM Inference
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the Gemma 3 text and multimodal model architecture for MLX-based inference.

Description

The gemma3.go file implements the full Gemma 3 text model with Q/K normalization, sliding window attention, and Gemma-style RMSNorm (1 + weight). It defines TextConfig with parameters for hidden size, attention heads, GQA key-value heads, and sliding window pattern. The TextModel struct includes embedding, decoder layers with pre/post attention and feed-forward norms, and tied output embeddings. The Attention struct supports both global and sliding window attention based on layer index, with precomputed norm scaling factors for efficiency. The model loads from safetensors with companion tokenizer and config files.

Usage

Used for text generation with Gemma 3 models in the standalone MLX engine, supporting both text-only and multimodal (with SigLIP vision tower) configurations.

Code Reference

Source Location

Repository: Ollama
File: x/imagegen/models/gemma3/gemma3.go
Lines: 1-614

Signature

type TextConfig struct {
	HiddenSize            int32   `json:"hidden_size"`
	NumHiddenLayers       int32   `json:"num_hidden_layers"`
	NumAttentionHeads     int32   `json:"num_attention_heads"`
	NumKeyValueHeads      int32   `json:"num_key_value_heads"`
	HeadDim               int32   `json:"head_dim"`
	SlidingWindow         int32   `json:"sliding_window"`
	SlidingWindowPattern  int32   `json:"sliding_window_pattern"`
}

type TextModel struct {
	EmbedTokens *nn.Embedding   `weight:"model.embed_tokens"`
	Layers      []*DecoderLayer `weight:"model.layers"`
	Norm        *nn.RMSNorm     `weight:"model.norm"`
	Output      *nn.Linear      `weight:"-"` // Tied to EmbedTokens
}

func LoadText(modelPath string) (*TextModel, error)
func (m *TextModel) Forward(tokens *mlx.Array, caches []cache.Cache) *mlx.Array

Import

import "github.com/ollama/ollama/x/imagegen/models/gemma3"

I/O Contract

Inputs

Name	Type	Required	Description
modelPath	string	Yes	Directory path containing model files
tokens	*mlx.Array	Yes	Input token IDs [B, L]
caches	[]cache.Cache	Yes	KV caches for each layer

Outputs

Name	Type	Description
*TextModel	*TextModel	Loaded model ready for inference
*mlx.Array	*mlx.Array	Logits [B, L, vocab_size]

Usage Examples

model, err := gemma3.LoadText("/path/to/gemma3")
if err != nil {
    return err
}

caches := model.NewCache(0)
logits := model.Forward(tokens, caches)

Related Pages

Principle:Ollama_Ollama_ImageGeneration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment