Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Deepseek ai Janus CFG Input Preparation for Flow

From Leeroopedia


Knowledge Sources
Domains Image_Generation, Guided_Generation
Last Updated 2026-02-10 09:30 GMT

Overview

A technique for constructing paired conditional and unconditional inputs for classifier-free guidance in rectified flow image generation, including attention mask construction for the ODE denoising loop.

Description

CFG input preparation for rectified flow differs from autoregressive CFG in several ways:

  1. Batch structure: First half of the batch is conditional, second half is unconditional (instead of interleaved even/odd rows)
  2. Last token removal: The final <begin_of_image> token is removed from embeddings because it will be replaced by the timestep embedding at each denoising step
  3. Attention mask: An explicit attention mask is constructed with 1s for conditional tokens and 0s for unconditional prompt tokens, enabling the LLM to distinguish between the two branches
  4. Extended mask: The attention mask accounts for the full sequence length including prompt + timestep + 576 image latent tokens

Usage

Use this principle after prompt formatting and before the noise initialization step in the JanusFlow pipeline.

Theoretical Basis

The CFG formula for velocity predictions in rectified flow:

vguided=wvcond(w1)vuncond

Where w is the CFG weight (typically 2.0 for JanusFlow).

Batch structure (for parallel_size=5):

  • Rows 0-4: Conditional (full prompt embeddings)
  • Rows 5-9: Unconditional (pad-masked prompt embeddings)
  • Attention mask: 1s for rows 0-4, 0s for prompt region of rows 5-9

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment