Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Junyanz Pytorch CycleGAN and pix2pix Image Saving

From Leeroopedia


Metadata
Knowledge Sources pytorch-CycleGAN-and-pix2pix
Domains Image-to-Image Translation, Visualization, Tensor Processing
Last Updated 2026-02-09

Overview

A utility pattern that converts model output tensors to image files and assembles them into browsable HTML result pages.

Description

The image saving pipeline bridges the gap between PyTorch model outputs (float tensors in [-1, 1] range) and human-viewable image files (uint8 arrays in [0, 255] range on disk). The pipeline consists of three stages:

Stage 1: Tensor to NumPy Conversion (tensor2im)

  • Takes a PyTorch tensor in the range [-1, 1]
  • Denormalizes by applying (tensor + 1) / 2.0 * 255.0
  • Converts to a NumPy uint8 array in [0, 255]
  • Handles both single-channel (grayscale) and three-channel (RGB) images
  • Handles batch dimension by extracting the first image

Stage 2: Image Writing (save_image)

  • Takes a NumPy array and writes it to disk as a standard image file (PNG or JPEG)
  • Uses PIL (Pillow) for image I/O

Stage 3: Gallery Assembly (save_images)

  • Iterates over all visual outputs for a test sample (e.g., real_A, fake_B, rec_A)
  • Converts each tensor to a NumPy image via tensor2im
  • Optionally resizes images according to aspect_ratio
  • Saves each image to the webpage's image directory via save_image
  • Adds the full set of images as a row in the HTML gallery via webpage.add_images

Usage

Called in the main test loop of test.py after each forward pass. For each test sample, save_images receives the model's visual outputs dictionary and the sample's file path, then handles all conversion, saving, and gallery integration.

Theoretical Basis

GAN generators typically output tensors normalized to the [-1, 1] range (matching the tanh activation function commonly used at the generator's output layer). The tanh range is preferred during training because:

  • It is symmetric around zero, matching normalized input distributions
  • It provides stronger gradients than sigmoid in the saturated regions
  • It naturally maps to the standard image normalization (image / 255.0 - 0.5) / 0.5

The denormalization step (tensor + 1) / 2.0 * 255.0 reverses this normalization to produce valid pixel values. This is a lossy operation (float to uint8 quantization), but the 256 discrete levels provide sufficient precision for visual display.

Assembling results into HTML galleries rather than simply saving individual images provides significant practical benefits: researchers can quickly scan through many results, compare input/output pairs side by side, and share results as a single HTML file that opens in any browser.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment