Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Deepseek ai Janus VQ VAE Decoding

From Leeroopedia


Knowledge Sources
Domains Image_Generation, Generative_Models
Last Updated 2026-02-10 09:30 GMT

Overview

A procedure for converting discrete VQ codebook indices back into continuous pixel values using the VQ-VAE decoder.

Description

VQ-VAE decoding is the step that transforms the generated discrete tokens into actual images. The VQ-VAE (Vector Quantized Variational Autoencoder) maintains a learned codebook of embedding vectors. Given a sequence of codebook indices from the autoregressive generation, the decoder:

  1. Looks up the corresponding embedding vectors from the codebook
  2. Reshapes them into a spatial feature map
  3. Passes them through a convolutional decoder to reconstruct pixel values

In Janus, the VQ-VAE uses a VQ-16 architecture with a codebook of discrete tokens and a CNN-based encoder-decoder.

Usage

Use this principle after the autoregressive token generation loop produces VQ codebook indices. The decoded output is a tensor of pixel values in the range [-1, 1] that requires post-processing to obtain displayable images.

Theoretical Basis

The VQ-VAE decoding pipeline:

  1. Codebook lookup: Each index z_i maps to an embedding vector e_{z_i} from the learned codebook
    E=Codebook[z]B×C×H×W
  2. Post-quantization convolution: A 1×1 conv adjusts channels from codebook dimension to decoder input dimension
    Failed to parse (syntax error): {\displaystyle E' = \text{post\_quant\_conv}(E)}
  3. CNN Decoder: A series of upsampling + residual blocks reconstruct the full-resolution image
    I=Decoder(E)B×3×Himg×Wimg

Related Pages

Implemented By

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment