Heuristic:Junyanz Pytorch CycleGAN and pix2pix CuDNN Benchmark Scale Width
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Debugging |
| Last Updated | 2026-02-09 16:00 GMT |
Overview
Disable cuDNN auto-tuner benchmark when using `scale_width` preprocessing, because variable input sizes cause performance degradation.
Description
PyTorch's `torch.backends.cudnn.benchmark = True` enables the cuDNN auto-tuner, which selects the fastest convolution algorithm for a given input size. However, this optimization assumes fixed input dimensions. When using `--preprocess scale_width`, images retain their original aspect ratios, producing inputs of varying spatial dimensions across the dataset. The auto-tuner must re-benchmark for each new size, causing significant overhead instead of speedup. The codebase automatically disables benchmarking when `scale_width` is detected.
Usage
This heuristic is automatically applied by the codebase. Be aware of it when using `--preprocess scale_width` or `--preprocess scale_width_and_crop`, as training may be slightly slower than with fixed-size inputs. If you use `--preprocess resize_and_crop` (default) or `--preprocess crop`, cuDNN benchmarking remains enabled for optimal performance.
The Insight (Rule of Thumb)
- Action: Set `torch.backends.cudnn.benchmark = False` when input image sizes vary across the dataset.
- Value: Automatically set in `BaseModel.__init__()` when `opt.preprocess == "scale_width"`.
- Trade-off: Disabling the benchmark avoids re-tuning overhead for variable sizes but loses the auto-tuning speedup for fixed-size workloads.
Reasoning
The cuDNN auto-tuner caches the optimal algorithm for each unique (input_size, kernel_size, padding, stride) combination. When all images have the same dimensions (as with `resize_and_crop`), the algorithm is selected once and reused for the entire training run, providing a measurable speedup. With `scale_width`, each image may have different height, causing the cache to miss repeatedly and the tuner to re-run, which is slower than using a default algorithm.
Code evidence from `models/base_model.py:38-40`:
# with [scale_width], input images might have different sizes,
# which hurts the performance of cudnn.benchmark.
if opt.preprocess != "scale_width":
torch.backends.cudnn.benchmark = True