Heuristic:PeterL1n BackgroundMattingV2 Data Augmentation Strategy
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Deep_Learning |
| Last Updated | 2026-02-09 02:00 GMT |
Overview
Training augmentation strategy with tuned probabilities: shadow overlay 30%, noise injection 40%, background color jitter 80%, and background affine perturbation 30%.
Description
Both training scripts apply a multi-stage augmentation pipeline directly in the training loop (not in the dataset transforms). These augmentations simulate real-world imperfections: shadows cast by the subject onto the background, camera noise, background color/lighting changes, and slight camera movement between capturing the background and the source frame. Each augmentation is applied stochastically with a specific probability per batch element.
Usage
Use this heuristic when training MattingBase or MattingRefine, or when modifying the training pipeline. The probabilities and parameter ranges are carefully tuned to produce robust models that handle real-world input variability. Both training scripts use identical augmentation code.
The Insight (Rule of Thumb)
- Shadow augmentation (30% probability):
- Simulates foreground shadows on the background.
- Shadow intensity: `0.3 * random()` of alpha matte.
- Blurred with box filter kernel 20-40px.
- Randomly affine-transformed before application.
- Noise augmentation (40% probability):
- Adds Gaussian noise to both source and background images.
- Noise scale: `0.03 * random()`.
- Applied independently to source and background.
- Background color jitter (80% probability):
- Simulates lighting changes between background capture and inference.
- Kornia ColorJitter with (brightness=0.18, contrast=0.18, saturation=0.18, hue=0.1).
- Background affine perturbation (30% probability):
- Simulates slight camera movement.
- Small rotation (±1°), small translation (±1%).
- Trade-off: Higher augmentation probabilities increase robustness but slow convergence. These values are the authors' tuned settings.
Reasoning
Background matting requires a pre-captured background image that may differ from the actual background at inference time due to lighting changes, camera movement, and foreground shadows. Without these augmentations, the model would overfit to perfectly matched source-background pairs and fail on real-world inputs. The shadow augmentation is particularly important because foreground objects cast shadows that don't appear in the background-only capture, creating a domain gap between training composites and real captures.
Code evidence from `train_base.py:153-181`:
# Augment with shadow
aug_shadow_idx = torch.rand(len(true_src)) < 0.3
if aug_shadow_idx.any():
aug_shadow = true_pha[aug_shadow_idx].mul(0.3 * random.random())
aug_shadow = T.RandomAffine(degrees=(-5, 5), translate=(0.2, 0.2), scale=(0.5, 1.5), shear=(-5, 5))(aug_shadow)
aug_shadow = kornia.filters.box_blur(aug_shadow, (random.choice(range(20, 40)),) * 2)
true_src[aug_shadow_idx] = true_src[aug_shadow_idx].sub_(aug_shadow).clamp_(0, 1)
# Augment with noise
aug_noise_idx = torch.rand(len(true_src)) < 0.4
# Augment background with jitter
aug_jitter_idx = torch.rand(len(true_src)) < 0.8
# Augment background with affine
aug_affine_idx = torch.rand(len(true_bgr)) < 0.3
The identical augmentation pipeline is also used in `train_refine.py:180-209`.