Implementation:Deepseek ai Janus CFG Input Preparation Flow

Knowledge Sources	Janus
Domains	Image_Generation, Guided_Generation
Last Updated	2026-02-10 09:30 GMT

Overview

Pattern for constructing paired conditional/unconditional inputs with attention masks for classifier-free guidance in the JanusFlow rectified flow pipeline.

Description

This user-defined pattern sets up the CFG input structure for the rectified flow ODE loop. Unlike the autoregressive variant, it uses a split-batch structure (first half conditional, second half unconditional) and constructs an explicit attention mask.

Usage

Implement this pattern after prompt tokenization and before noise initialization.

Code Reference

Source Location

Repository: Janus
File: demo/app_janusflow.py
Lines: L76-94

Pattern Implementation

parallel_size = 5

# Stack tokens: first parallel_size conditional, next parallel_size unconditional
tokens = torch.stack([input_ids] * (parallel_size * 2)).cuda()
tokens[parallel_size:, 1:] = vl_chat_processor.pad_id  # Mask unconditional rows

# Get embeddings, remove last <bog> token (replaced by timestep embedding)
inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens)
inputs_embeds = inputs_embeds[:, :-1, :]

# Build attention mask: [parallel_size*2, seq_len-1 + 1 + 576]
attention_mask = torch.ones((parallel_size * 2, inputs_embeds.shape[1] + 577)).to(vl_gpt.device)
attention_mask[parallel_size:, 1:inputs_embeds.shape[1]] = 0  # Zero out unconditional prompt
attention_mask = attention_mask.int()

Import

# Uses existing model and processor instances — no separate import

I/O Contract

Inputs

Name	Type	Required	Description
input_ids	torch.LongTensor [seq_len]	Yes	Tokenized prompt with image_start_tag
parallel_size	int	Yes	Number of images to generate (e.g., 5)
pad_id	int	Yes	Pad token ID from vl_chat_processor.pad_id

Outputs

Name	Type	Description
inputs_embeds	torch.Tensor [parallel_size*2, seq_len-1, D]	CFG-paired embeddings (last token removed)
attention_mask	torch.IntTensor [parallel_size*2, seq_len-1+577]	1s for attended, 0s for masked positions

Usage Examples

CFG Setup for JanusFlow

parallel_size = 5
input_ids = torch.LongTensor(tokenizer.encode(text))

tokens = torch.stack([input_ids] * (parallel_size * 2)).cuda()
tokens[parallel_size:, 1:] = vl_chat_processor.pad_id

inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens)
inputs_embeds = inputs_embeds[:, :-1, :]  # Remove last <bog> token

attention_mask = torch.ones((parallel_size * 2, inputs_embeds.shape[1] + 577)).to(vl_gpt.device)
attention_mask[parallel_size:, 1:inputs_embeds.shape[1]] = 0
attention_mask = attention_mask.int()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment