Principle:Deepseek ai Janus VAE Decoding

Knowledge Sources	Auto-Encoding Variational Bayes High-Resolution Image Synthesis with Latent Diffusion Models
Domains	Image_Generation, Generative_Models
Last Updated	2026-02-10 09:30 GMT

Overview

A procedure for converting continuous latent representations into pixel images using the SDXL VAE decoder.

Description

VAE (Variational Autoencoder) decoding is the final image reconstruction step in the JanusFlow pipeline. After the ODE denoising loop produces a clean latent representation, the SDXL VAE's decoder converts it from the 4-channel, 48×48 latent space to a 3-channel, 384×384 pixel image.

The latent must be divided by the VAE's scaling_factor (0.13025 for SDXL) before decoding to account for the normalization applied during training.

Usage

Use this principle after the ODE denoising loop completes and before post-processing the output images.

Theoretical Basis

The VAE decoder maps from latent space to pixel space:

$I = Decoder (\frac{z}{σ}) \in ℝ^{B \times 3 \times 384 \times 384}$

Where σ = 0.13025 is the SDXL VAE scaling factor and z is the denoised latent from the ODE loop.

The SDXL VAE has an 8× spatial downscaling factor, so 48×48 latents produce 384×384 pixel images.

Related Pages

Implemented By

Implementation:Deepseek_ai_Janus_AutoencoderKL_Decode

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment