Heuristic:Junyanz Pytorch CycleGAN and pix2pix High Res Crop Training
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Computer_Vision |
| Last Updated | 2026-02-09 16:00 GMT |
Overview
Train with cropped images at target scale, then test at full resolution, to handle high-resolution images within GPU memory limits.
Description
CycleGAN is particularly memory-intensive because it loads four networks (two generators and two discriminators) simultaneously. High-resolution images cannot be processed at full size during training. The recommended approach is to train on randomly cropped patches at the target scale (e.g., 360x360 crops from 1024px images) and test at full resolution (loading only one generator). This ensures training and testing occur at the same scale while fitting within GPU memory during training.
Usage
Apply this heuristic when working with images larger than 512x512, or when encountering CUDA out-of-memory errors during CycleGAN training. For pix2pix (which uses only one generator and one discriminator), the memory pressure is lower and this technique may not be needed.
The Insight (Rule of Thumb)
- Action: Use `--preprocess scale_width_and_crop` for training and `--preprocess scale_width` for testing.
- Value: Training example: `--preprocess scale_width_and_crop --load_size 1024 --crop_size 360`. Testing example: `--preprocess scale_width --load_size 1024`.
- Trade-off: The model only sees local patches during training but generates full images at test time. This works because convolutional networks are translation-equivariant.
- Key constraint: Training and test must use the same scale to avoid a training/test gap.
Reasoning
Convolutional generators are fully convolutional and can handle arbitrary input sizes at test time. During training, the model learns local texture transformations from cropped patches. At test time, only one generator is needed (not both generators and both discriminators), which halves the memory requirement and allows processing larger images.
From `docs/tips.md`:
"CycleGAN is quite memory-intensive as four networks (two generators and two discriminators) need to be loaded on one GPU, so a large image cannot be entirely loaded. In this case, we recommend training with cropped images. For example, to generate 1024px results, you can train with --preprocess scale_width_and_crop --load_size 1024 --crop_size 360, and test with --preprocess scale_width --load_size 1024."
From `docs/qa.md`:
"During training, train CycleGAN on cropped images of the training set. Please be careful not to change the aspect ratio or the scale of the original image [...] Then at test time, you can load only one generator to produce the results in a single direction. This greatly saves GPU memory."