Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Deepseek ai Janus CFG Input Preparation Flow

From Leeroopedia


Knowledge Sources
Domains Image_Generation, Guided_Generation
Last Updated 2026-02-10 09:30 GMT

Overview

Pattern for constructing paired conditional/unconditional inputs with attention masks for classifier-free guidance in the JanusFlow rectified flow pipeline.

Description

This user-defined pattern sets up the CFG input structure for the rectified flow ODE loop. Unlike the autoregressive variant, it uses a split-batch structure (first half conditional, second half unconditional) and constructs an explicit attention mask.

Usage

Implement this pattern after prompt tokenization and before noise initialization.

Code Reference

Source Location

  • Repository: Janus
  • File: demo/app_janusflow.py
  • Lines: L76-94

Pattern Implementation

parallel_size = 5

# Stack tokens: first parallel_size conditional, next parallel_size unconditional
tokens = torch.stack([input_ids] * (parallel_size * 2)).cuda()
tokens[parallel_size:, 1:] = vl_chat_processor.pad_id  # Mask unconditional rows

# Get embeddings, remove last <bog> token (replaced by timestep embedding)
inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens)
inputs_embeds = inputs_embeds[:, :-1, :]

# Build attention mask: [parallel_size*2, seq_len-1 + 1 + 576]
attention_mask = torch.ones((parallel_size * 2, inputs_embeds.shape[1] + 577)).to(vl_gpt.device)
attention_mask[parallel_size:, 1:inputs_embeds.shape[1]] = 0  # Zero out unconditional prompt
attention_mask = attention_mask.int()

Import

# Uses existing model and processor instances — no separate import

I/O Contract

Inputs

Name Type Required Description
input_ids torch.LongTensor [seq_len] Yes Tokenized prompt with image_start_tag
parallel_size int Yes Number of images to generate (e.g., 5)
pad_id int Yes Pad token ID from vl_chat_processor.pad_id

Outputs

Name Type Description
inputs_embeds torch.Tensor [parallel_size*2, seq_len-1, D] CFG-paired embeddings (last token removed)
attention_mask torch.IntTensor [parallel_size*2, seq_len-1+577] 1s for attended, 0s for masked positions

Usage Examples

CFG Setup for JanusFlow

parallel_size = 5
input_ids = torch.LongTensor(tokenizer.encode(text))

tokens = torch.stack([input_ids] * (parallel_size * 2)).cuda()
tokens[parallel_size:, 1:] = vl_chat_processor.pad_id

inputs_embeds = vl_gpt.language_model.get_input_embeddings()(tokens)
inputs_embeds = inputs_embeds[:, :-1, :]  # Remove last <bog> token

attention_mask = torch.ones((parallel_size * 2, inputs_embeds.shape[1] + 577)).to(vl_gpt.device)
attention_mask[parallel_size:, 1:inputs_embeds.shape[1]] = 0
attention_mask = attention_mask.int()

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment