Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Ollama Ollama Imagegen Flux2 Transformer

From Leeroopedia
Knowledge Sources
Domains Image Generation, Diffusion Models
Last Updated 2025-02-15 00:00 GMT

Overview

Implements the FLUX.2 Klein dual-stream and single-stream transformer architecture for diffusion denoising.

Description

The transformer.go file defines the full Flux2Transformer2DModel with its dual-stream (5 layers) and single-stream (20 layers) architecture. It includes TimestepEmbedder for sinusoidal timestep conditioning, Modulation layers for adaptive normalization (AdaLN), TransformerBlockAttn with separate image/text Q/K/V projections and QK normalization, FeedForward with SwiGLU activation, and both TransformerBlock (dual-stream) and SingleTransformerBlock types. The transformer config captures architectural parameters like attention_head_dim (128), num_attention_heads (24), joint_attention_dim (7680), and rope_theta (2000). Weight loading uses struct tags mapping to diffusers naming conventions.

Usage

Used as the core denoising network within the FLUX.2 Klein pipeline, receiving noisy latents and text embeddings to predict velocity for flow-match scheduling.

Code Reference

Source Location

  • Repository: Ollama
  • File: x/imagegen/models/flux2/transformer.go
  • Lines: 1-562

Signature

type TransformerConfig struct {
	AttentionHeadDim  int32   `json:"attention_head_dim"`
	AxesDimsRoPE      []int32 `json:"axes_dims_rope"`
	NumAttentionHeads int32   `json:"num_attention_heads"`
	NumLayers         int32   `json:"num_layers"`
	NumSingleLayers   int32   `json:"num_single_layers"`
	JointAttentionDim int32   `json:"joint_attention_dim"`
	MLPRatio          float32 `json:"mlp_ratio"`
	RopeTheta         int32   `json:"rope_theta"`
}

type TimestepEmbedder struct { ... }
type TransformerBlockAttn struct { ... }
type FeedForward struct { ... }
type TransformerBlock struct { ... }
type Flux2Transformer2DModel struct { ... }

func (t *TimestepEmbedder) Forward(timesteps *mlx.Array) *mlx.Array
func (ff *FeedForward) Forward(x *mlx.Array) *mlx.Array

Import

import "github.com/ollama/ollama/x/imagegen/models/flux2"

I/O Contract

Inputs

Name Type Required Description
x *mlx.Array Yes Noisy latent tensor [B, L, inner_dim]
timesteps *mlx.Array Yes Timestep values [B] for conditioning
encoderHiddenStates *mlx.Array Yes Text embeddings from encoder
ropeCache *RoPECache Yes Precomputed cos/sin position embeddings

Outputs

Name Type Description
*mlx.Array *mlx.Array Predicted velocity tensor [B, L, inner_dim]

Usage Examples

cfg := &flux2.TransformerConfig{
    AttentionHeadDim:  128,
    NumAttentionHeads: 24,
    NumLayers:         5,
    NumSingleLayers:   20,
}

// Forward pass during denoising loop
velocity := transformer.Forward(noisyLatents, timestep, textEmbed, ropeCache)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment