Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Huggingface Diffusers VaeImageProcessor Postprocess

From Leeroopedia
Knowledge Sources
Domains Diffusion_Models, Image_Processing, Tensor_Conversion
Last Updated 2026-02-13 21:00 GMT

Overview

Concrete tool for converting raw VAE decoder tensor outputs into user-friendly image formats (PIL, NumPy, or PyTorch tensors) provided by the Diffusers library.

Description

VaeImageProcessor.postprocess is the final processing step in the diffusion inference pipeline. It takes the raw pixel-space tensor from the VAE decoder (values in [-1, 1], shape [B, C, H, W]) and converts it to the requested output format.

The method implements a progressive transformation chain:

  1. Input validation: Verifies the input is a PyTorch tensor and the output type is one of the supported formats ("latent", "pt", "np", "pil").
  2. Early return for latent: If output_type="latent", the tensor is returned as-is with no processing.
  3. Conditional denormalization: Maps values from [-1, 1] to [0, 1] using the formula image / 2 + 0.5, controlled by the do_denormalize parameter. If do_denormalize is None, it falls back to the processor's do_normalize config setting.
  4. Early return for pt: If output_type="pt", returns the denormalized PyTorch tensor.
  5. NumPy conversion: Converts the tensor to NumPy format via pt_to_numpy, which handles CPU transfer, channel reordering ([B,C,H,W] to [B,H,W,C]), and float32 casting.
  6. Early return for np: If output_type="np", returns the NumPy array.
  7. PIL conversion: Converts the NumPy array to a list of PIL Image objects via numpy_to_pil, which scales to [0, 255] uint8.

Usage

This method is called automatically by all pipeline __call__ methods after VAE decoding. Call it directly when building custom pipelines, when you have raw VAE output that needs format conversion, or when reprocessing intermediate results.

Code Reference

Source Location

  • Repository: diffusers
  • File: src/diffusers/image_processor.py
  • Lines: 738-810

Signature

def postprocess(
    self,
    image: torch.Tensor,
    output_type: str = "pil",
    do_denormalize: list[bool] | None = None,
) -> PIL.Image.Image | np.ndarray | torch.Tensor:

Import

from diffusers.image_processor import VaeImageProcessor

I/O Contract

Inputs

Name Type Required Description
image torch.Tensor Yes The image tensor from the VAE decoder. Expected shape: [B, C, H, W] (e.g., [1, 3, 1024, 1024]). Values typically in range [-1, 1]. Must be a PyTorch tensor; other types will raise a ValueError.
output_type str No The desired output format. One of: "pil" (PIL Image objects), "np" (NumPy array), "pt" (PyTorch tensor), or "latent" (raw pass-through). Defaults to "pil".
do_denormalize list[bool] or None No Per-image flag controlling whether to denormalize from [-1, 1] to [0, 1]. Length must match batch size. If None, defaults to the processor's do_normalize config setting for all images in the batch.

Outputs

Name Type Description
result PIL.Image.Image or list[PIL.Image.Image] When output_type="pil": a list of PIL Image objects, each with shape (H, W, 3) and uint8 values in [0, 255].
result np.ndarray When output_type="np": a NumPy array with shape [B, H, W, C] and float32 values in [0, 1].
result torch.Tensor When output_type="pt": a PyTorch tensor with shape [B, C, H, W] and values in [0, 1].
result torch.Tensor When output_type="latent": the input tensor passed through unchanged.

Usage Examples

Basic Usage

from diffusers.image_processor import VaeImageProcessor
import torch

# Create a processor with default settings
processor = VaeImageProcessor(vae_scale_factor=8)

# Simulate VAE decoder output (range [-1, 1])
fake_vae_output = torch.randn(1, 3, 1024, 1024).clamp(-1, 1)

# Convert to PIL images
pil_images = processor.postprocess(fake_vae_output, output_type="pil")
pil_images[0].save("output.png")

# Convert to NumPy array
np_images = processor.postprocess(fake_vae_output, output_type="np")
print(np_images.shape)  # (1, 1024, 1024, 3)

# Keep as PyTorch tensor (denormalized to [0, 1])
pt_images = processor.postprocess(fake_vae_output, output_type="pt")
print(pt_images.shape)  # torch.Size([1, 3, 1024, 1024])

Inside a Custom Pipeline

from diffusers import StableDiffusionXLPipeline
from diffusers.image_processor import VaeImageProcessor
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
).to("cuda")

# Generate in latent mode (skip automatic post-processing)
latents = pipe(
    "A photo of a cat",
    output_type="latent",
    num_inference_steps=30,
).images

# Manually decode
latents = latents / pipe.vae.config.scaling_factor
with torch.no_grad():
    image_tensor = pipe.vae.decode(latents, return_dict=False)[0]

# Manually post-process with custom settings
processor = VaeImageProcessor(vae_scale_factor=pipe.vae_scale_factor)
pil_image = processor.postprocess(image_tensor, output_type="pil")[0]
pil_image.save("custom_pipeline_output.png")

Selective Denormalization

from diffusers.image_processor import VaeImageProcessor
import torch

processor = VaeImageProcessor(vae_scale_factor=8)

# Batch of 3 images, but only denormalize the first two
batch = torch.randn(3, 3, 512, 512).clamp(-1, 1)
result = processor.postprocess(
    batch,
    output_type="pt",
    do_denormalize=[True, True, False],
)
# result[0] and result[1] are in [0, 1], result[2] remains in [-1, 1]

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment