Implementation:Junyanz Pytorch CycleGAN and pix2pix Combine A and B
| Knowledge Sources | pytorch-CycleGAN-and-pix2pix |
|---|---|
| Domains | Image-to-Image Translation, Data Preparation, Paired Image Translation |
| Last Updated | 2026-02-09 |
Overview
The datasets/combine_A_and_B.py script is an external tool that concatenates corresponding image pairs from two separate directories into single side-by-side images for pix2pix training. It uses multiprocessing for efficient parallel processing of large datasets.
Description
The script iterates over phase directories (train, test, val) within the specified source folders. For each phase, it finds all images in the A directory, locates the corresponding image in the B directory (by matching filename), and concatenates them horizontally. The combined images are saved to an output directory preserving the phase structure.
The script supports two modes:
- Default mode (--use_AB not set): Takes separate --fold_A and --fold_B directories
- AB mode (--use_AB set): Assumes images are already in A and B subdirectories within the same parent
Multiprocessing is enabled by default and can be disabled with --no_multiprocessing for debugging.
Usage
Run as a command-line tool before pix2pix training when preparing custom paired datasets.
Code Reference
Source Location
| File | Lines |
|---|---|
| datasets/combine_A_and_B.py | L1-68 |
Signature
# Command-line arguments
parser.add_argument("--fold_A", help="Input directory for domain A images",
type=str, required=True)
parser.add_argument("--fold_B", help="Input directory for domain B images",
type=str, required=True)
parser.add_argument("--fold_AB", help="Output directory for combined AB images",
type=str, required=True)
parser.add_argument("--num_imgs", help="Max number of images to process",
type=int, default=1000000)
parser.add_argument("--use_AB", help="If set, use A/B subdirectory structure",
action='store_true')
parser.add_argument("--no_multiprocessing",
help="Disable multiprocessing for debugging",
action='store_true')
Import
N/A -- This is a standalone command-line script, not an importable Python module.
I/O Contract
| Parameter | Type | Description |
|---|---|---|
| --fold_A | str (CLI arg) | Path to directory containing domain A images (with train/, test/, val/ subdirectories) |
| --fold_B | str (CLI arg) | Path to directory containing domain B images (matching structure) |
| --fold_AB | str (CLI arg) | Path to output directory for combined images |
| --num_imgs | int (CLI arg) | Maximum number of images to process per phase (default: 1000000) |
| --use_AB | flag (CLI arg) | If set, expects A/ and B/ subdirectories within source |
| --no_multiprocessing | flag (CLI arg) | If set, disables parallel processing |
| Output | Type | Description |
|---|---|---|
| <fold_AB>/train/ | directory | Combined AB training images (horizontally concatenated pairs) |
| <fold_AB>/test/ | directory | Combined AB test images |
| <fold_AB>/val/ | directory | Combined AB validation images (if val/ phase exists in source) |
Usage Examples
# Combine separate A and B directories into pix2pix format
python datasets/combine_A_and_B.py \
--fold_A ./datasets/my_dataset/A \
--fold_B ./datasets/my_dataset/B \
--fold_AB ./datasets/my_dataset
# Limit to first 100 images per phase
python datasets/combine_A_and_B.py \
--fold_A ./datasets/my_dataset/A \
--fold_B ./datasets/my_dataset/B \
--fold_AB ./datasets/my_dataset \
--num_imgs 100
# Disable multiprocessing for debugging
python datasets/combine_A_and_B.py \
--fold_A ./datasets/my_dataset/A \
--fold_B ./datasets/my_dataset/B \
--fold_AB ./datasets/my_dataset \
--no_multiprocessing