Principle:Deepseek ai Janus CFG Input Preparation for Flow

Knowledge Sources	Classifier-Free Diffusion Guidance JanusFlow: Harmonizing Autoregression and Rectified Flow
Domains	Image_Generation, Guided_Generation
Last Updated	2026-02-10 09:30 GMT

Overview

A technique for constructing paired conditional and unconditional inputs for classifier-free guidance in rectified flow image generation, including attention mask construction for the ODE denoising loop.

Description

CFG input preparation for rectified flow differs from autoregressive CFG in several ways:

Batch structure: First half of the batch is conditional, second half is unconditional (instead of interleaved even/odd rows)
Last token removal: The final <begin_of_image> token is removed from embeddings because it will be replaced by the timestep embedding at each denoising step
Attention mask: An explicit attention mask is constructed with 1s for conditional tokens and 0s for unconditional prompt tokens, enabling the LLM to distinguish between the two branches
Extended mask: The attention mask accounts for the full sequence length including prompt + timestep + 576 image latent tokens

Usage

Use this principle after prompt formatting and before the noise initialization step in the JanusFlow pipeline.

Theoretical Basis

The CFG formula for velocity predictions in rectified flow:

$v_{g u i d e d} = w \cdot v_{c o n d} - (w - 1) \cdot v_{u n c o n d}$

Where w is the CFG weight (typically 2.0 for JanusFlow).

Batch structure (for parallel_size=5):

Rows 0-4: Conditional (full prompt embeddings)
Rows 5-9: Unconditional (pad-masked prompt embeddings)
Attention mask: 1s for rows 0-4, 0s for prompt region of rows 5-9

Related Pages

Implemented By

Implementation:Deepseek_ai_Janus_CFG_Input_Preparation_Flow

Uses Heuristic

Heuristic:Deepseek_ai_Janus_CFG_Weight_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment