Implementation:Ollama Ollama Imagegen Flux2 Transformer
| Knowledge Sources | |
|---|---|
| Domains | Image Generation, Diffusion Models |
| Last Updated | 2025-02-15 00:00 GMT |
Overview
Implements the FLUX.2 Klein dual-stream and single-stream transformer architecture for diffusion denoising.
Description
The transformer.go file defines the full Flux2Transformer2DModel with its dual-stream (5 layers) and single-stream (20 layers) architecture. It includes TimestepEmbedder for sinusoidal timestep conditioning, Modulation layers for adaptive normalization (AdaLN), TransformerBlockAttn with separate image/text Q/K/V projections and QK normalization, FeedForward with SwiGLU activation, and both TransformerBlock (dual-stream) and SingleTransformerBlock types. The transformer config captures architectural parameters like attention_head_dim (128), num_attention_heads (24), joint_attention_dim (7680), and rope_theta (2000). Weight loading uses struct tags mapping to diffusers naming conventions.
Usage
Used as the core denoising network within the FLUX.2 Klein pipeline, receiving noisy latents and text embeddings to predict velocity for flow-match scheduling.
Code Reference
Source Location
- Repository: Ollama
- File: x/imagegen/models/flux2/transformer.go
- Lines: 1-562
Signature
type TransformerConfig struct {
AttentionHeadDim int32 `json:"attention_head_dim"`
AxesDimsRoPE []int32 `json:"axes_dims_rope"`
NumAttentionHeads int32 `json:"num_attention_heads"`
NumLayers int32 `json:"num_layers"`
NumSingleLayers int32 `json:"num_single_layers"`
JointAttentionDim int32 `json:"joint_attention_dim"`
MLPRatio float32 `json:"mlp_ratio"`
RopeTheta int32 `json:"rope_theta"`
}
type TimestepEmbedder struct { ... }
type TransformerBlockAttn struct { ... }
type FeedForward struct { ... }
type TransformerBlock struct { ... }
type Flux2Transformer2DModel struct { ... }
func (t *TimestepEmbedder) Forward(timesteps *mlx.Array) *mlx.Array
func (ff *FeedForward) Forward(x *mlx.Array) *mlx.Array
Import
import "github.com/ollama/ollama/x/imagegen/models/flux2"
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| x | *mlx.Array | Yes | Noisy latent tensor [B, L, inner_dim] |
| timesteps | *mlx.Array | Yes | Timestep values [B] for conditioning |
| encoderHiddenStates | *mlx.Array | Yes | Text embeddings from encoder |
| ropeCache | *RoPECache | Yes | Precomputed cos/sin position embeddings |
Outputs
| Name | Type | Description |
|---|---|---|
| *mlx.Array | *mlx.Array | Predicted velocity tensor [B, L, inner_dim] |
Usage Examples
cfg := &flux2.TransformerConfig{
AttentionHeadDim: 128,
NumAttentionHeads: 24,
NumLayers: 5,
NumSingleLayers: 20,
}
// Forward pass during denoising loop
velocity := transformer.Forward(noisyLatents, timestep, textEmbed, ropeCache)