Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Huggingface Diffusers AutoencoderKL Decode

From Leeroopedia
Knowledge Sources
Domains Diffusion_Models, Variational_Autoencoders, Latent_Space
Last Updated 2026-02-13 21:00 GMT

Overview

Concrete tool for decoding a batch of latent tensors into pixel-space images using the KL-regularized Variational Autoencoder provided by the Diffusers library.

Description

AutoencoderKL.decode takes a batch of latent tensors (output of the denoising loop, after unscaling) and passes them through the VAE decoder to reconstruct pixel-space images. The method supports sliced decoding: when use_slicing is enabled and the batch size is greater than 1, each latent in the batch is decoded individually and the results are concatenated. This reduces peak GPU memory usage at the cost of slightly slower throughput.

Internally, the method delegates to _decode, which runs the latent through a post-quantization convolution layer and then the main decoder network. The decoder consists of residual blocks, self-attention layers, and upsampling operations that progressively increase the spatial resolution from latent dimensions to pixel dimensions.

The output is either a DecoderOutput named tuple containing a sample tensor, or a plain tuple containing just the decoded tensor, depending on the return_dict flag.

Usage

This method is called automatically by the pipeline after the denoising loop completes (unless output_type="latent"). Call it directly when implementing custom pipelines, performing latent-space manipulations that need to be visualized, or when you need explicit control over the decoding step. The input latents must already be unscaled (divided by scaling_factor) before being passed to this method.

Code Reference

Source Location

  • Repository: diffusers
  • File: src/diffusers/models/autoencoders/autoencoder_kl.py
  • Lines: 214-248

Signature

def decode(
    self,
    z: torch.FloatTensor,
    return_dict: bool = True,
    generator=None,
) -> DecoderOutput | torch.FloatTensor:

Import

from diffusers import AutoencoderKL

I/O Contract

Inputs

Name Type Required Description
z torch.FloatTensor Yes Input batch of latent vectors to decode. Shape: [batch_size, latent_channels, height, width] (e.g., [1, 4, 128, 128] for SDXL at 1024x1024). Must already be unscaled (divided by scaling_factor).
return_dict bool No Whether to return a DecoderOutput named tuple or a plain tuple. Defaults to True.
generator torch.Generator No Random number generator for reproducibility. Currently unused in the standard decode path but available for subclasses.

Outputs

Name Type Description
sample torch.Tensor The decoded pixel-space image tensor. Shape: [batch_size, 3, height * vae_scale_factor, width * vae_scale_factor] (e.g., [1, 3, 1024, 1024]). Values are in the range [-1, 1]. Wrapped in DecoderOutput if return_dict=True, otherwise returned as the first element of a plain tuple.

Usage Examples

Basic Usage

from diffusers import AutoencoderKL
import torch

# Load the SDXL VAE
vae = AutoencoderKL.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    subfolder="vae",
    torch_dtype=torch.float16,
).to("cuda")

# Assume latents come from a denoising loop (shape: [1, 4, 128, 128])
# Unscale the latents first
latents = latents / vae.config.scaling_factor

# Decode to pixel space
with torch.no_grad():
    decoded = vae.decode(latents, return_dict=False)[0]

# decoded shape: [1, 3, 1024, 1024], range [-1, 1]
# Normalize to [0, 1] for visualization
image_tensor = (decoded / 2 + 0.5).clamp(0, 1)

With Sliced Decoding for Memory Efficiency

from diffusers import AutoencoderKL
import torch

vae = AutoencoderKL.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    subfolder="vae",
    torch_dtype=torch.float16,
).to("cuda")

# Enable sliced decoding for large batches
vae.enable_slicing()

# Decode a batch of latents one at a time to save memory
batch_latents = torch.randn(4, 4, 128, 128, device="cuda", dtype=torch.float16)
batch_latents = batch_latents / vae.config.scaling_factor

with torch.no_grad():
    decoded = vae.decode(batch_latents).sample
# Each latent is decoded individually, then concatenated

Inside a Custom Pipeline

from diffusers import StableDiffusionXLPipeline
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
).to("cuda")

# Generate latents without decoding
latent_output = pipe(
    "A photo of a mountain landscape",
    output_type="latent",
    num_inference_steps=30,
).images

# Manually unscale and decode
latents = latent_output / pipe.vae.config.scaling_factor
with torch.no_grad():
    image_tensor = pipe.vae.decode(latents, return_dict=False)[0]

# Post-process manually
image = pipe.image_processor.postprocess(image_tensor, output_type="pil")[0]
image.save("mountain.png")

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment