Principle:PeterL1n BackgroundMattingV2 Coarse matting prediction
| Knowledge Sources | |
|---|---|
| Domains | Image_Matting, Computer_Vision, Deep_Learning |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
An encoder-decoder neural network that predicts alpha mattes, foreground colors, and error maps from source-background image pairs at reduced resolution.
Description
Coarse matting prediction is the first stage of the two-stage BackgroundMattingV2 pipeline. Given a source image (containing the subject) and a captured background image (without the subject), the network predicts:
- Alpha matte (pha): Per-pixel opacity indicating subject vs. background
- Foreground (fgr): The subject's true RGB color (predicted as a residual added to the source image)
- Error map (err): Per-pixel confidence indicating where the prediction may be inaccurate
- Hidden features (hid): Intermediate features passed to the refinement stage
The architecture concatenates source and background images along the channel dimension (6 input channels) and passes them through a ResNet or MobileNetV2 encoder, an ASPP module for multi-scale context, and a multi-scale decoder. The foreground is predicted as a residual to the source image rather than absolute RGB values, which simplifies learning.
Usage
Use this principle when you need fast, global matting predictions. The coarse model operates at full or reduced resolution and provides the foundation for the refinement stage. For training, all four outputs (pha, fgr, err, hid) are used. For standalone inference without refinement, only the first two outputs (pha, fgr) are needed.
Theoretical Basis
The matting equation defines the compositing operation:
Where I is the observed image, F is the foreground, B is the background, and α is the alpha matte. Given known B, the network learns to predict α and F simultaneously.
The foreground is predicted as a residual:
This residual formulation exploits the fact that foreground pixels are close to the source image values, making the learning target smaller in magnitude.
The loss function combines multiple terms:
# Abstract loss formulation
loss = L1(pred_pha, true_pha) # Alpha L1 loss
+ L1(sobel(pred_pha), sobel(true_pha)) # Alpha gradient loss (edge sharpness)
+ L1(pred_fgr * mask, true_fgr * mask) # Foreground L1 loss (masked)
+ MSE(pred_err, actual_error) # Error map supervision