Principle:Junyanz Pytorch CycleGAN and pix2pix Dataset Acquisition
| Knowledge Sources | pytorch-CycleGAN-and-pix2pix |
|---|---|
| Domains | Image-to-Image Translation, Data Preparation, Benchmark Datasets |
| Last Updated | 2026-02-09 |
Overview
A data preparation step that downloads and unpacks standard benchmark datasets for image-to-image translation from remote servers.
Description
Shell scripts use wget to download tar.gz or zip archives from Berkeley servers (http://efrosgans.eecs.berkeley.edu/), then unpack them into the expected directory structure under ./datasets/.
CycleGAN datasets are organized into four subdirectories:
- trainA/ -- Training images from domain A
- trainB/ -- Training images from domain B
- testA/ -- Test images from domain A
- testB/ -- Test images from domain B
pix2pix datasets are organized into:
- train/ -- Training image pairs (A and B concatenated side-by-side)
- test/ -- Test image pairs
- Optionally val/ -- Validation image pairs
Available CycleGAN datasets:
- apple2orange, horse2zebra, summer2winter_yosemite, monet2photo
- cezanne2photo, ukiyoe2photo, vangogh2photo
- maps, cityscapes, facades
- iphone2dslr_flower, ae_photos
Available pix2pix datasets:
- facades, cityscapes, maps
- edges2shoes, edges2handbags
- night2day
Usage
Run the appropriate shell script with the desired dataset name as an argument before training. The scripts are one-time setup steps that populate the datasets/ directory.
Theoretical Basis
Standard benchmark datasets are essential for reproducible research in image-to-image translation. The CycleGAN datasets feature unpaired images from two domains, while pix2pix datasets provide spatially aligned image pairs. Using consistent datasets allows direct comparison between methods and across papers.
The directory structure convention (trainA/trainB for unpaired, train/test for paired) is tightly coupled to the dataset loader classes (UnalignedDataset and AlignedDataset), which expect these specific subdirectory names.