Principle:Deepseek ai Janus VQ VAE Decoding
| Knowledge Sources | |
|---|---|
| Domains | Image_Generation, Generative_Models |
| Last Updated | 2026-02-10 09:30 GMT |
Overview
A procedure for converting discrete VQ codebook indices back into continuous pixel values using the VQ-VAE decoder.
Description
VQ-VAE decoding is the step that transforms the generated discrete tokens into actual images. The VQ-VAE (Vector Quantized Variational Autoencoder) maintains a learned codebook of embedding vectors. Given a sequence of codebook indices from the autoregressive generation, the decoder:
- Looks up the corresponding embedding vectors from the codebook
- Reshapes them into a spatial feature map
- Passes them through a convolutional decoder to reconstruct pixel values
In Janus, the VQ-VAE uses a VQ-16 architecture with a codebook of discrete tokens and a CNN-based encoder-decoder.
Usage
Use this principle after the autoregressive token generation loop produces VQ codebook indices. The decoded output is a tensor of pixel values in the range [-1, 1] that requires post-processing to obtain displayable images.
Theoretical Basis
The VQ-VAE decoding pipeline:
- Codebook lookup: Each index z_i maps to an embedding vector e_{z_i} from the learned codebook
- Post-quantization convolution: A 1×1 conv adjusts channels from codebook dimension to decoder input dimension
- Failed to parse (syntax error): {\displaystyle E' = \text{post\_quant\_conv}(E)}
- CNN Decoder: A series of upsampling + residual blocks reconstruct the full-resolution image