Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Deepseek ai Janus Autoregressive VQ Token Generation

From Leeroopedia


Knowledge Sources
Domains Image_Generation, Autoregressive_Models
Last Updated 2026-02-10 09:30 GMT

Overview

A loop-based procedure for generating discrete VQ codebook indices one token at a time using an LLM with a generation head, guided by classifier-free guidance.

Description

Autoregressive VQ token generation is the core image generation mechanism in Janus. Rather than generating continuous pixel values, the model generates a sequence of discrete codebook indices from a VQ-VAE vocabulary. Each token represents a spatial patch of the output image.

The generation loop runs for a fixed number of steps (576 tokens for a 384×384 image with 16×16 patches). At each step:

  1. The LLM backbone produces hidden states from the current embeddings
  2. The gen_head (a 2-layer MLP) projects hidden states to VQ codebook logits
  3. CFG combines conditional and unconditional logits
  4. A token is sampled from the resulting distribution
  5. The sampled token is converted to an embedding via prepare_gen_img_embeds for the next step

Usage

Use this principle after CFG input preparation. The output is a tensor of VQ codebook indices that must be decoded by the VQ-VAE to produce pixel images.

Theoretical Basis

The autoregressive factorization for image tokens:

P(z1,...,zN)=i=1NP(zi|z<i,c)

Where z_i are VQ codebook indices, c is the text condition, and N=576 (24×24 spatial grid).

At each step, classifier-free guidance adjusts the logits:

logits=logitsuncond+w(logitscondlogitsuncond)

The gen_head architecture is: Linear(D → D_img) → GELU → Linear(D_img → codebook_size)

The prepare_gen_img_embeds converts sampled tokens back to embeddings: Embedding(token) → gen_aligner(gen_embed(token))

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment