Implementation:Ollama Ollama Imagegen Qwen3 TextEncoder
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, Text Encoding |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements a shared Qwen3 text encoder used by multiple image generation models (FLUX.2 Klein, Z-Image) for prompt encoding.
Description
The text_encoder.go file provides the Qwen3 text encoder with SwiGLU MLP, QK normalization, custom RoPE implementation, and GQA (Grouped Query Attention) support. The Attention struct applies per-head RMSNorm to Q/K, custom rotary embeddings via applyRoPEQwen3, and scaled dot-product attention with optional attention sinks. The encoder supports ForwardWithLayerOutputs for extracting hidden states from specific layers (e.g., layers 8, 17, 26 for FLUX.2). RepeatKV handles GQA by expanding key/value heads. Configuration includes head_dim, num_key_value_heads, and rope_theta parameters loaded from the model manifest.
Usage
Used by FLUX.2 Klein and Z-Image pipelines to encode text prompts into embeddings for conditioning the diffusion transformer.
Code Reference
Source Location
- Repository: Ollama
- File: x/imagegen/models/qwen3/text_encoder.go
- Lines: 1-390
Signature
type Config struct {
HiddenSize int32 `json:"hidden_size"`
NumHiddenLayers int32 `json:"num_hidden_layers"`
NumAttentionHeads int32 `json:"num_attention_heads"`
NumKeyValueHeads int32 `json:"num_key_value_heads"`
HeadDim int32 `json:"head_dim"`
RopeTheta float32 `json:"rope_theta"`
}
type Attention struct {
QProj nn.LinearLayer `weight:"q_proj"`
KProj nn.LinearLayer `weight:"k_proj"`
VProj nn.LinearLayer `weight:"v_proj"`
OProj nn.LinearLayer `weight:"o_proj"`
QNorm *nn.RMSNorm `weight:"q_norm"`
KNorm *nn.RMSNorm `weight:"k_norm"`
}
func (attn *Attention) Forward(x *mlx.Array, mask *mlx.Array, maskMode string) *mlx.Array
func applyRoPEQwen3(x *mlx.Array, seqLen int32, theta float32) *mlx.Array
Import
import "github.com/ollama/ollama/x/imagegen/models/qwen3"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | *mlx.Array | Yes | Input embeddings [B, L, hidden_size] |
| mask | *mlx.Array | No | Attention mask for padding |
| maskMode | string | No | Mask mode ("causal", "none", etc.) |
Outputs
| Name | Type | Description |
|---|---|---|
| *mlx.Array | *mlx.Array | Encoded text embeddings [B, L, hidden_size] |
Usage Examples
encoder := &qwen3.TextEncoder{}
if err := encoder.Load(manifest, "text_encoder/config.json"); err != nil {
return err
}
// Encode prompt with multi-layer output extraction
embeddings, layerOutputs := encoder.ForwardWithLayerOutputs(tokenEmbed, mask, []int{8, 17, 26})