Heuristic:PeterL1n BackgroundMattingV2 Data Augmentation Strategy

Knowledge Sources	BackgroundMattingV2
Domains	Computer_Vision, Deep_Learning
Last Updated	2026-02-09 02:00 GMT

Overview

Training augmentation strategy with tuned probabilities: shadow overlay 30%, noise injection 40%, background color jitter 80%, and background affine perturbation 30%.

Description

Both training scripts apply a multi-stage augmentation pipeline directly in the training loop (not in the dataset transforms). These augmentations simulate real-world imperfections: shadows cast by the subject onto the background, camera noise, background color/lighting changes, and slight camera movement between capturing the background and the source frame. Each augmentation is applied stochastically with a specific probability per batch element.

Usage

Use this heuristic when training MattingBase or MattingRefine, or when modifying the training pipeline. The probabilities and parameter ranges are carefully tuned to produce robust models that handle real-world input variability. Both training scripts use identical augmentation code.

The Insight (Rule of Thumb)

Shadow augmentation (30% probability):
- Simulates foreground shadows on the background.
- Shadow intensity: `0.3 * random()` of alpha matte.
- Blurred with box filter kernel 20-40px.
- Randomly affine-transformed before application.
Noise augmentation (40% probability):
- Adds Gaussian noise to both source and background images.
- Noise scale: `0.03 * random()`.
- Applied independently to source and background.
Background color jitter (80% probability):
- Simulates lighting changes between background capture and inference.
- Kornia ColorJitter with (brightness=0.18, contrast=0.18, saturation=0.18, hue=0.1).
Background affine perturbation (30% probability):
- Simulates slight camera movement.
- Small rotation (±1°), small translation (±1%).
Trade-off: Higher augmentation probabilities increase robustness but slow convergence. These values are the authors' tuned settings.

Reasoning

Background matting requires a pre-captured background image that may differ from the actual background at inference time due to lighting changes, camera movement, and foreground shadows. Without these augmentations, the model would overfit to perfectly matched source-background pairs and fail on real-world inputs. The shadow augmentation is particularly important because foreground objects cast shadows that don't appear in the background-only capture, creating a domain gap between training composites and real captures.

Code evidence from `train_base.py:153-181`:

# Augment with shadow
aug_shadow_idx = torch.rand(len(true_src)) < 0.3
if aug_shadow_idx.any():
    aug_shadow = true_pha[aug_shadow_idx].mul(0.3 * random.random())
    aug_shadow = T.RandomAffine(degrees=(-5, 5), translate=(0.2, 0.2), scale=(0.5, 1.5), shear=(-5, 5))(aug_shadow)
    aug_shadow = kornia.filters.box_blur(aug_shadow, (random.choice(range(20, 40)),) * 2)
    true_src[aug_shadow_idx] = true_src[aug_shadow_idx].sub_(aug_shadow).clamp_(0, 1)

# Augment with noise
aug_noise_idx = torch.rand(len(true_src)) < 0.4

# Augment background with jitter
aug_jitter_idx = torch.rand(len(true_src)) < 0.8

# Augment background with affine
aug_affine_idx = torch.rand(len(true_bgr)) < 0.3

The identical augmentation pipeline is also used in `train_refine.py:180-209`.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment