Principle:Deepseek ai Janus CFG Input Preparation

Knowledge Sources	Classifier-Free Diffusion Guidance Janus: Decoupling Visual Encoding
Domains	Image_Generation, Guided_Generation
Last Updated	2026-02-10 09:30 GMT

Overview

A technique for constructing paired conditional and unconditional input embeddings to enable classifier-free guidance during autoregressive image token generation.

Description

Classifier-Free Guidance (CFG) improves the quality and text-alignment of generated images by computing two forward passes per generation step: one conditional (with the full text prompt) and one unconditional (with the prompt replaced by padding tokens). The final logits are a weighted combination of both.

For CFG input preparation, the tokenized prompt is duplicated into paired rows: even-indexed rows contain the full conditional prompt, while odd-indexed rows have the content tokens replaced with the pad token ID (keeping only structural tokens). Both are embedded into the language model's embedding space.

Usage

Use this principle after prompt formatting and before the autoregressive token generation loop. It sets up the parallel conditional/unconditional structure needed for CFG-guided generation.

Theoretical Basis

The CFG formula for the guided logits is:

${logits}_{g u i d e d} = {logits}_{u n c o n d} + w \cdot ({logits}_{c o n d} - {logits}_{u n c o n d})$

Where w is the CFG weight (typically 5.0 for autoregressive generation in Janus).

Input structure for parallel_size images:

Total batch size: parallel_size × 2
Even rows (0, 2, 4, ...): Full conditional prompt embeddings
Odd rows (1, 3, 5, ...): Unconditional (pad-masked) prompt embeddings

Related Pages

Implemented By

Implementation:Deepseek_ai_Janus_CFG_Input_Preparation_AR

Uses Heuristic

Heuristic:Deepseek_ai_Janus_CFG_Weight_Tuning

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment