Implementation:Ollama Ollama Imagegen Qwen3 TextEncoder

Knowledge Sources	Ollama
Domains	Image Generation, Text Encoding
Last Updated	2025-02-15 00:00 GMT

Overview

Implements a shared Qwen3 text encoder used by multiple image generation models (FLUX.2 Klein, Z-Image) for prompt encoding.

Description

The text_encoder.go file provides the Qwen3 text encoder with SwiGLU MLP, QK normalization, custom RoPE implementation, and GQA (Grouped Query Attention) support. The Attention struct applies per-head RMSNorm to Q/K, custom rotary embeddings via applyRoPEQwen3, and scaled dot-product attention with optional attention sinks. The encoder supports ForwardWithLayerOutputs for extracting hidden states from specific layers (e.g., layers 8, 17, 26 for FLUX.2). RepeatKV handles GQA by expanding key/value heads. Configuration includes head_dim, num_key_value_heads, and rope_theta parameters loaded from the model manifest.

Usage

Used by FLUX.2 Klein and Z-Image pipelines to encode text prompts into embeddings for conditioning the diffusion transformer.

Code Reference

Source Location

Repository: Ollama
File: x/imagegen/models/qwen3/text_encoder.go
Lines: 1-390

Signature

type Config struct {
	HiddenSize        int32   `json:"hidden_size"`
	NumHiddenLayers   int32   `json:"num_hidden_layers"`
	NumAttentionHeads int32   `json:"num_attention_heads"`
	NumKeyValueHeads  int32   `json:"num_key_value_heads"`
	HeadDim           int32   `json:"head_dim"`
	RopeTheta         float32 `json:"rope_theta"`
}

type Attention struct {
	QProj nn.LinearLayer `weight:"q_proj"`
	KProj nn.LinearLayer `weight:"k_proj"`
	VProj nn.LinearLayer `weight:"v_proj"`
	OProj nn.LinearLayer `weight:"o_proj"`
	QNorm *nn.RMSNorm    `weight:"q_norm"`
	KNorm *nn.RMSNorm    `weight:"k_norm"`
}

func (attn *Attention) Forward(x *mlx.Array, mask *mlx.Array, maskMode string) *mlx.Array
func applyRoPEQwen3(x *mlx.Array, seqLen int32, theta float32) *mlx.Array

Import

import "github.com/ollama/ollama/x/imagegen/models/qwen3"

I/O Contract

Inputs

Name	Type	Required	Description
x	*mlx.Array	Yes	Input embeddings [B, L, hidden_size]
mask	*mlx.Array	No	Attention mask for padding
maskMode	string	No	Mask mode ("causal", "none", etc.)

Outputs

Name	Type	Description
*mlx.Array	*mlx.Array	Encoded text embeddings [B, L, hidden_size]

Usage Examples

encoder := &qwen3.TextEncoder{}
if err := encoder.Load(manifest, "text_encoder/config.json"); err != nil {
    return err
}

// Encode prompt with multi-layer output extraction
embeddings, layerOutputs := encoder.ForwardWithLayerOutputs(tokenEmbed, mask, []int{8, 17, 26})

Related Pages

Principle:Ollama_Ollama_ImageGeneration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment