Workflow:Junyanz Pytorch CycleGAN and pix2pix Pix2pix Training

Knowledge Sources	pytorch-CycleGAN-and-pix2pix pix2pix Training Tips
Domains	Computer_Vision, GANs, Image_Translation
Last Updated	2026-02-09 16:00 GMT

Overview

End-to-end process for training a pix2pix conditional GAN model to perform paired image-to-image translation.

Description

This workflow covers the full procedure for training a pix2pix model that learns a mapping from input images to output images using paired training data. The model uses a single U-Net generator conditioned on the input image and a PatchGAN discriminator that evaluates image patches. The training objective combines adversarial loss with a weighted L1 reconstruction loss (lambda_L1=100 by default) to produce sharp outputs faithful to the ground truth. Unlike CycleGAN, pix2pix requires aligned image pairs and trains only a single directional mapping. It also supports a colorization variant that operates in Lab color space to convert grayscale images to color.

Usage

Execute this workflow when you have a dataset of paired images (input and corresponding output) and want to learn a deterministic mapping between them. Common applications include semantic labels to photographs, edges to objects, architectural labels to building facades, day to night conversion, and image colorization. The paired images must be spatially aligned and can either be provided as side-by-side concatenated images or combined using the provided preparation script.

Execution Steps

Step 1: Environment Setup

Install the required dependencies by creating a Conda environment from the provided specification file or by installing PyTorch and its dependencies manually. The dependencies are the same as for CycleGAN: PyTorch (2.4+), torchvision, Pillow, visdom or wandb for visualization. For the colorization variant, scikit-image is additionally required for Lab color space conversions.

Key considerations:

Python 3.11 is recommended
scikit-image is required only for the colorization model variant
CUDA-enabled GPU is recommended for practical training speeds

Step 2: Dataset Preparation

Obtain or create a paired image dataset. For built-in datasets, use the download script to fetch datasets like facades, maps, or edges2shoes. For custom datasets, prepare aligned image pairs and combine them into side-by-side format using the provided combination script. The script concatenates each pair of images (A and B) horizontally into a single image. The final dataset must be organized with a train and test split, with each image containing both domains concatenated side-by-side.

Key considerations:

Built-in datasets are downloaded from the Berkeley EECS server
The combine_A_and_B script supports multiprocessing for fast processing
Images must be spatially aligned and of the same dimensions
The --direction flag controls which side is input vs. output (AtoB or BtoA)
For colorization, standard RGB images are automatically converted to Lab color space

Step 3: Configure Training Options

Set the training hyperparameters through command-line arguments. Essential parameters include the data root path, experiment name, model type (pix2pix), generator architecture (unet_256 by default), translation direction (AtoB or BtoA), and the L1 loss weight. Pix2pix defaults differ from CycleGAN: it uses batch normalization, vanilla GAN loss, a U-Net generator, no image pool (pool_size=0), and aligned dataset mode.

Key considerations:

Default generator is unet_256 with batch normalization and dropout
Default discriminator is a 70x70 PatchGAN (basic)
Image pool is disabled (pool_size=0) because pix2pix uses conditional discrimination
The --direction BtoA flag is needed when B is the input domain (e.g., labels to photos for facades)
For colorization, add --model colorization --dataset_mode colorization
lambda_L1=100 by default, heavily weighting reconstruction fidelity

Step 4: Train the Model

Launch the training script which runs the full pix2pix training loop. The script parses options, creates the aligned dataset loader, instantiates a single generator and conditional discriminator, and initializes weights. Each iteration processes a paired image: the generator produces a fake output from the input, the discriminator sees the concatenation of input and output (real or fake), and both networks are updated with adversarial and L1 losses.

What happens each iteration:

Forward pass: Generator produces fake_B = G(real_A)
Discriminator update: Evaluate real pair (A, B) and fake pair (A, G(A)); compute binary cross-entropy loss
Generator update: Compute adversarial loss to fool discriminator plus L1 loss between G(A) and real B
Periodically save results and checkpoints

Step 5: Monitor Training

Track training progress through loss values and visual results. The key losses are G_GAN (generator adversarial loss), G_L1 (generator reconstruction loss), D_real (discriminator on real pairs), and D_fake (discriminator on generated pairs). Visual results showing input, generated output, and ground truth are saved to HTML galleries.

Key considerations:

The L1 loss should decrease steadily indicating improved reconstruction
D_real and D_fake losses should remain balanced
Visdom or WandB dashboards provide real-time monitoring
HTML results are saved at checkpoints/{name}/web/index.html

Step 6: Test and Evaluate

Run the test script to generate output images from the test set. The script loads the trained generator, processes each test input, and saves the results as an HTML gallery. For each test image, the output shows the input (real_A), generated output (fake_B), and ground truth (real_B) side by side for visual comparison.

Key considerations:

Ensure --netG, --norm, and --direction match the training configuration
Test script forces batch_size=1, serial_batches, and no_flip
Results are saved to results/{name}/{phase}_{epoch}/index.html
For colorization, output images are converted from Lab back to RGB for display

Execution Diagram

GitHub URL

Workflow Repository