Implementation:Ollama Ollama Imagegen Flux2 Transformer

Knowledge Sources	Ollama
Domains	Image Generation, Diffusion Models
Last Updated	2025-02-15 00:00 GMT

Overview

Implements the FLUX.2 Klein dual-stream and single-stream transformer architecture for diffusion denoising.

Description

The transformer.go file defines the full Flux2Transformer2DModel with its dual-stream (5 layers) and single-stream (20 layers) architecture. It includes TimestepEmbedder for sinusoidal timestep conditioning, Modulation layers for adaptive normalization (AdaLN), TransformerBlockAttn with separate image/text Q/K/V projections and QK normalization, FeedForward with SwiGLU activation, and both TransformerBlock (dual-stream) and SingleTransformerBlock types. The transformer config captures architectural parameters like attention_head_dim (128), num_attention_heads (24), joint_attention_dim (7680), and rope_theta (2000). Weight loading uses struct tags mapping to diffusers naming conventions.

Usage

Used as the core denoising network within the FLUX.2 Klein pipeline, receiving noisy latents and text embeddings to predict velocity for flow-match scheduling.

Code Reference

Source Location

Repository: Ollama
File: x/imagegen/models/flux2/transformer.go
Lines: 1-562

Signature

type TransformerConfig struct {
	AttentionHeadDim  int32   `json:"attention_head_dim"`
	AxesDimsRoPE      []int32 `json:"axes_dims_rope"`
	NumAttentionHeads int32   `json:"num_attention_heads"`
	NumLayers         int32   `json:"num_layers"`
	NumSingleLayers   int32   `json:"num_single_layers"`
	JointAttentionDim int32   `json:"joint_attention_dim"`
	MLPRatio          float32 `json:"mlp_ratio"`
	RopeTheta         int32   `json:"rope_theta"`
}

type TimestepEmbedder struct { ... }
type TransformerBlockAttn struct { ... }
type FeedForward struct { ... }
type TransformerBlock struct { ... }
type Flux2Transformer2DModel struct { ... }

func (t *TimestepEmbedder) Forward(timesteps *mlx.Array) *mlx.Array
func (ff *FeedForward) Forward(x *mlx.Array) *mlx.Array

Import

import "github.com/ollama/ollama/x/imagegen/models/flux2"

I/O Contract

Inputs

Name	Type	Required	Description
x	*mlx.Array	Yes	Noisy latent tensor [B, L, inner_dim]
timesteps	*mlx.Array	Yes	Timestep values [B] for conditioning
encoderHiddenStates	*mlx.Array	Yes	Text embeddings from encoder
ropeCache	*RoPECache	Yes	Precomputed cos/sin position embeddings

Outputs

Name	Type	Description
*mlx.Array	*mlx.Array	Predicted velocity tensor [B, L, inner_dim]

Usage Examples

cfg := &flux2.TransformerConfig{
    AttentionHeadDim:  128,
    NumAttentionHeads: 24,
    NumLayers:         5,
    NumSingleLayers:   20,
}

// Forward pass during denoising loop
velocity := transformer.Forward(noisyLatents, timestep, textEmbed, ropeCache)

Related Pages

Principle:Ollama_Ollama_ImageGeneration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment