Implementation:Deepseek ai Janus CFG Input Preparation Flow
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Image_Generation, Guided_Generation |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
Pattern for constructing paired conditional/unconditional inputs with attention masks for classifier-free guidance in the JanusFlow rectified flow pipeline.
Description
This user-defined pattern sets up the CFG input structure for the rectified flow ODE loop. Unlike the autoregressive variant, it uses a split-batch structure (first half conditional, second half unconditional) and constructs an explicit attention mask.
Usage
Implement this pattern after prompt tokenization and before noise initialization.
Code Reference
Source Location
- Repository: Janus
- File: demo/app_janusflow.py
- Lines: L76-94
Pattern Implementation
parallel_size = 5
# Stack tokens: first parallel_size conditional, next parallel_size unconditional
tokens = torch.stack([input_ids] * (parallel_size * 2)).cuda()
tokens[parallel_size:, 1:] = vl_chat_processor.pad_id # Mask unconditional rows
# Get embeddings, remove last <bog> token (replaced by timestep embedding)
inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens)
inputs_embeds = inputs_embeds[:, :-1, :]
# Build attention mask: [parallel_size*2, seq_len-1 + 1 + 576]
attention_mask = torch.ones((parallel_size * 2, inputs_embeds.shape[1] + 577)).to(vl_gpt.device)
attention_mask[parallel_size:, 1:inputs_embeds.shape[1]] = 0 # Zero out unconditional prompt
attention_mask = attention_mask.int()
Import
# Uses existing model and processor instances — no separate import
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_ids | torch.LongTensor [seq_len] | Yes | Tokenized prompt with image_start_tag |
| parallel_size | int | Yes | Number of images to generate (e.g., 5) |
| pad_id | int | Yes | Pad token ID from vl_chat_processor.pad_id |
Outputs
| Name | Type | Description |
|---|---|---|
| inputs_embeds | torch.Tensor [parallel_size*2, seq_len-1, D] | CFG-paired embeddings (last token removed) |
| attention_mask | torch.IntTensor [parallel_size*2, seq_len-1+577] | 1s for attended, 0s for masked positions |
Usage Examples
CFG Setup for JanusFlow
parallel_size = 5
input_ids = torch.LongTensor(tokenizer.encode(text))
tokens = torch.stack([input_ids] * (parallel_size * 2)).cuda()
tokens[parallel_size:, 1:] = vl_chat_processor.pad_id
inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens)
inputs_embeds = inputs_embeds[:, :-1, :] # Remove last <bog> token
attention_mask = torch.ones((parallel_size * 2, inputs_embeds.shape[1] + 577)).to(vl_gpt.device)
attention_mask[parallel_size:, 1:inputs_embeds.shape[1]] = 0
attention_mask = attention_mask.int()
Related Pages
Implements Principle
Requires Environment
Uses Heuristic
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment