Implementation:Huggingface Diffusers VaeImageProcessor Postprocess

Knowledge Sources	Diffusers Diffusers Docs
Domains	Diffusion_Models, Image_Processing, Tensor_Conversion
Last Updated	2026-02-13 21:00 GMT

Overview

Concrete tool for converting raw VAE decoder tensor outputs into user-friendly image formats (PIL, NumPy, or PyTorch tensors) provided by the Diffusers library.

Description

VaeImageProcessor.postprocess is the final processing step in the diffusion inference pipeline. It takes the raw pixel-space tensor from the VAE decoder (values in [-1, 1], shape [B, C, H, W]) and converts it to the requested output format.

The method implements a progressive transformation chain:

Input validation: Verifies the input is a PyTorch tensor and the output type is one of the supported formats ("latent", "pt", "np", "pil").
Early return for latent: If output_type="latent", the tensor is returned as-is with no processing.
Conditional denormalization: Maps values from [-1, 1] to [0, 1] using the formula image / 2 + 0.5, controlled by the do_denormalize parameter. If do_denormalize is None, it falls back to the processor's do_normalize config setting.
Early return for pt: If output_type="pt", returns the denormalized PyTorch tensor.
NumPy conversion: Converts the tensor to NumPy format via pt_to_numpy, which handles CPU transfer, channel reordering ([B,C,H,W] to [B,H,W,C]), and float32 casting.
Early return for np: If output_type="np", returns the NumPy array.
PIL conversion: Converts the NumPy array to a list of PIL Image objects via numpy_to_pil, which scales to [0, 255] uint8.

Usage

This method is called automatically by all pipeline __call__ methods after VAE decoding. Call it directly when building custom pipelines, when you have raw VAE output that needs format conversion, or when reprocessing intermediate results.

Code Reference

Source Location

Repository: diffusers
File: src/diffusers/image_processor.py
Lines: 738-810

Signature

def postprocess(
    self,
    image: torch.Tensor,
    output_type: str = "pil",
    do_denormalize: list[bool] | None = None,
) -> PIL.Image.Image | np.ndarray | torch.Tensor:

Import

from diffusers.image_processor import VaeImageProcessor

I/O Contract

Inputs

Name	Type	Required	Description
image	`torch.Tensor`	Yes	The image tensor from the VAE decoder. Expected shape: `[B, C, H, W]` (e.g., `[1, 3, 1024, 1024]`). Values typically in range `[-1, 1]`. Must be a PyTorch tensor; other types will raise a `ValueError`.
output_type	`str`	No	The desired output format. One of: `"pil"` (PIL Image objects), `"np"` (NumPy array), `"pt"` (PyTorch tensor), or `"latent"` (raw pass-through). Defaults to `"pil"`.
do_denormalize	`list[bool]` or `None`	No	Per-image flag controlling whether to denormalize from `[-1, 1]` to `[0, 1]`. Length must match batch size. If `None`, defaults to the processor's `do_normalize` config setting for all images in the batch.

Outputs

Name	Type	Description
result	`PIL.Image.Image` or `list[PIL.Image.Image]`	When `output_type="pil"`: a list of PIL Image objects, each with shape `(H, W, 3)` and uint8 values in `[0, 255]`.
result	`np.ndarray`	When `output_type="np"`: a NumPy array with shape `[B, H, W, C]` and float32 values in `[0, 1]`.
result	`torch.Tensor`	When `output_type="pt"`: a PyTorch tensor with shape `[B, C, H, W]` and values in `[0, 1]`.
result	`torch.Tensor`	When `output_type="latent"`: the input tensor passed through unchanged.

Usage Examples

Basic Usage

from diffusers.image_processor import VaeImageProcessor
import torch

# Create a processor with default settings
processor = VaeImageProcessor(vae_scale_factor=8)

# Simulate VAE decoder output (range [-1, 1])
fake_vae_output = torch.randn(1, 3, 1024, 1024).clamp(-1, 1)

# Convert to PIL images
pil_images = processor.postprocess(fake_vae_output, output_type="pil")
pil_images[0].save("output.png")

# Convert to NumPy array
np_images = processor.postprocess(fake_vae_output, output_type="np")
print(np_images.shape)  # (1, 1024, 1024, 3)

# Keep as PyTorch tensor (denormalized to [0, 1])
pt_images = processor.postprocess(fake_vae_output, output_type="pt")
print(pt_images.shape)  # torch.Size([1, 3, 1024, 1024])

Inside a Custom Pipeline

from diffusers import StableDiffusionXLPipeline
from diffusers.image_processor import VaeImageProcessor
import torch

pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float16,
).to("cuda")

# Generate in latent mode (skip automatic post-processing)
latents = pipe(
    "A photo of a cat",
    output_type="latent",
    num_inference_steps=30,
).images

# Manually decode
latents = latents / pipe.vae.config.scaling_factor
with torch.no_grad():
    image_tensor = pipe.vae.decode(latents, return_dict=False)[0]

# Manually post-process with custom settings
processor = VaeImageProcessor(vae_scale_factor=pipe.vae_scale_factor)
pil_image = processor.postprocess(image_tensor, output_type="pil")[0]
pil_image.save("custom_pipeline_output.png")

Selective Denormalization

from diffusers.image_processor import VaeImageProcessor
import torch

processor = VaeImageProcessor(vae_scale_factor=8)

# Batch of 3 images, but only denormalize the first two
batch = torch.randn(3, 3, 512, 512).clamp(-1, 1)
result = processor.postprocess(
    batch,
    output_type="pt",
    do_denormalize=[True, True, False],
)
# result[0] and result[1] are in [0, 1], result[2] remains in [-1, 1]

Related Pages

Implements Principle

Principle:Huggingface_Diffusers_Post_Processing

Requires Environment

Environment:Huggingface_Diffusers_PyTorch_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment