Principle:PeterL1n BackgroundMattingV2 Coarse matting prediction

Knowledge Sources	Real-Time High-Resolution Background Matting DeepLabV3+ BackgroundMattingV2
Domains	Image_Matting, Computer_Vision, Deep_Learning
Last Updated	2026-02-09 00:00 GMT

Overview

An encoder-decoder neural network that predicts alpha mattes, foreground colors, and error maps from source-background image pairs at reduced resolution.

Description

Coarse matting prediction is the first stage of the two-stage BackgroundMattingV2 pipeline. Given a source image (containing the subject) and a captured background image (without the subject), the network predicts:

Alpha matte (pha): Per-pixel opacity indicating subject vs. background
Foreground (fgr): The subject's true RGB color (predicted as a residual added to the source image)
Error map (err): Per-pixel confidence indicating where the prediction may be inaccurate
Hidden features (hid): Intermediate features passed to the refinement stage

The architecture concatenates source and background images along the channel dimension (6 input channels) and passes them through a ResNet or MobileNetV2 encoder, an ASPP module for multi-scale context, and a multi-scale decoder. The foreground is predicted as a residual to the source image rather than absolute RGB values, which simplifies learning.

Usage

Use this principle when you need fast, global matting predictions. The coarse model operates at full or reduced resolution and provides the foundation for the refinement stage. For training, all four outputs (pha, fgr, err, hid) are used. For standalone inference without refinement, only the first two outputs (pha, fgr) are needed.

Theoretical Basis

The matting equation defines the compositing operation:

$I = α F + (1 - α) B$

Where I is the observed image, F is the foreground, B is the background, and α is the alpha matte. Given known B, the network learns to predict α and F simultaneously.

The foreground is predicted as a residual: $F = I + Δ F$

This residual formulation exploits the fact that foreground pixels are close to the source image values, making the learning target smaller in magnitude.

The loss function combines multiple terms:

# Abstract loss formulation
loss = L1(pred_pha, true_pha)           # Alpha L1 loss
     + L1(sobel(pred_pha), sobel(true_pha))  # Alpha gradient loss (edge sharpness)
     + L1(pred_fgr * mask, true_fgr * mask)  # Foreground L1 loss (masked)
     + MSE(pred_err, actual_error)            # Error map supervision

Related Pages

Implemented By

Implementation:PeterL1n_BackgroundMattingV2_MattingBase

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment