Implementation:Huggingface Diffusers VaeImageProcessor Postprocess
| Knowledge Sources | |
|---|---|
| Domains | Diffusion_Models, Image_Processing, Tensor_Conversion |
| Last Updated | 2026-02-13 21:00 GMT |
Overview
Concrete tool for converting raw VAE decoder tensor outputs into user-friendly image formats (PIL, NumPy, or PyTorch tensors) provided by the Diffusers library.
Description
VaeImageProcessor.postprocess is the final processing step in the diffusion inference pipeline. It takes the raw pixel-space tensor from the VAE decoder (values in [-1, 1], shape [B, C, H, W]) and converts it to the requested output format.
The method implements a progressive transformation chain:
- Input validation: Verifies the input is a PyTorch tensor and the output type is one of the supported formats (
"latent","pt","np","pil"). - Early return for latent: If
output_type="latent", the tensor is returned as-is with no processing. - Conditional denormalization: Maps values from
[-1, 1]to[0, 1]using the formulaimage / 2 + 0.5, controlled by thedo_denormalizeparameter. Ifdo_denormalizeisNone, it falls back to the processor'sdo_normalizeconfig setting. - Early return for pt: If
output_type="pt", returns the denormalized PyTorch tensor. - NumPy conversion: Converts the tensor to NumPy format via
pt_to_numpy, which handles CPU transfer, channel reordering ([B,C,H,W]to[B,H,W,C]), and float32 casting. - Early return for np: If
output_type="np", returns the NumPy array. - PIL conversion: Converts the NumPy array to a list of PIL Image objects via
numpy_to_pil, which scales to[0, 255]uint8.
Usage
This method is called automatically by all pipeline __call__ methods after VAE decoding. Call it directly when building custom pipelines, when you have raw VAE output that needs format conversion, or when reprocessing intermediate results.
Code Reference
Source Location
- Repository: diffusers
- File:
src/diffusers/image_processor.py - Lines: 738-810
Signature
def postprocess(
self,
image: torch.Tensor,
output_type: str = "pil",
do_denormalize: list[bool] | None = None,
) -> PIL.Image.Image | np.ndarray | torch.Tensor:
Import
from diffusers.image_processor import VaeImageProcessor
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| image | torch.Tensor |
Yes | The image tensor from the VAE decoder. Expected shape: [B, C, H, W] (e.g., [1, 3, 1024, 1024]). Values typically in range [-1, 1]. Must be a PyTorch tensor; other types will raise a ValueError.
|
| output_type | str |
No | The desired output format. One of: "pil" (PIL Image objects), "np" (NumPy array), "pt" (PyTorch tensor), or "latent" (raw pass-through). Defaults to "pil".
|
| do_denormalize | list[bool] or None |
No | Per-image flag controlling whether to denormalize from [-1, 1] to [0, 1]. Length must match batch size. If None, defaults to the processor's do_normalize config setting for all images in the batch.
|
Outputs
| Name | Type | Description |
|---|---|---|
| result | PIL.Image.Image or list[PIL.Image.Image] |
When output_type="pil": a list of PIL Image objects, each with shape (H, W, 3) and uint8 values in [0, 255].
|
| result | np.ndarray |
When output_type="np": a NumPy array with shape [B, H, W, C] and float32 values in [0, 1].
|
| result | torch.Tensor |
When output_type="pt": a PyTorch tensor with shape [B, C, H, W] and values in [0, 1].
|
| result | torch.Tensor |
When output_type="latent": the input tensor passed through unchanged.
|
Usage Examples
Basic Usage
from diffusers.image_processor import VaeImageProcessor
import torch
# Create a processor with default settings
processor = VaeImageProcessor(vae_scale_factor=8)
# Simulate VAE decoder output (range [-1, 1])
fake_vae_output = torch.randn(1, 3, 1024, 1024).clamp(-1, 1)
# Convert to PIL images
pil_images = processor.postprocess(fake_vae_output, output_type="pil")
pil_images[0].save("output.png")
# Convert to NumPy array
np_images = processor.postprocess(fake_vae_output, output_type="np")
print(np_images.shape) # (1, 1024, 1024, 3)
# Keep as PyTorch tensor (denormalized to [0, 1])
pt_images = processor.postprocess(fake_vae_output, output_type="pt")
print(pt_images.shape) # torch.Size([1, 3, 1024, 1024])
Inside a Custom Pipeline
from diffusers import StableDiffusionXLPipeline
from diffusers.image_processor import VaeImageProcessor
import torch
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")
# Generate in latent mode (skip automatic post-processing)
latents = pipe(
"A photo of a cat",
output_type="latent",
num_inference_steps=30,
).images
# Manually decode
latents = latents / pipe.vae.config.scaling_factor
with torch.no_grad():
image_tensor = pipe.vae.decode(latents, return_dict=False)[0]
# Manually post-process with custom settings
processor = VaeImageProcessor(vae_scale_factor=pipe.vae_scale_factor)
pil_image = processor.postprocess(image_tensor, output_type="pil")[0]
pil_image.save("custom_pipeline_output.png")
Selective Denormalization
from diffusers.image_processor import VaeImageProcessor
import torch
processor = VaeImageProcessor(vae_scale_factor=8)
# Batch of 3 images, but only denormalize the first two
batch = torch.randn(3, 3, 512, 512).clamp(-1, 1)
result = processor.postprocess(
batch,
output_type="pt",
do_denormalize=[True, True, False],
)
# result[0] and result[1] are in [0, 1], result[2] remains in [-1, 1]