Workflow:PeterL1n BackgroundMattingV2 Image matting inference
| Knowledge Sources | |
|---|---|
| Domains | Computer_Vision, Image_Matting, Inference |
| Last Updated | 2026-02-09 02:30 GMT |
Overview
Batch image matting pipeline that extracts alpha mattes, foreground layers, and composites from a directory of source images using a pre-captured background.
Description
This workflow performs background matting on a directory of still images. Given a set of source images (with a subject in front of a known background) and corresponding background images (the same scene without the subject), the model predicts per-pixel alpha mattes and foreground colors. These outputs can be used to composite the subject onto any new background.
The pipeline supports both the base model (MattingBase, coarse output only) and the refined model (MattingRefine, with selective high-resolution patch refinement). An optional homographic alignment preprocessing step can correct minor camera shifts between source and background captures using ORB feature matching and RANSAC homography estimation.
Output types include alpha matte (pha), foreground (fgr), composite with transparency (com), error map (err), and refinement region map (ref).
Usage
Execute this workflow when you have a directory of images taken against a known background and need to extract the subject with transparency. Typical use cases include product photography, portrait extraction, batch processing of photo shoots where a clean background plate was captured, and any scenario requiring high-quality alpha mattes from image pairs.
Execution Steps
Step 1: Model loading
Instantiate either MattingBase or MattingRefine with the chosen backbone architecture (ResNet50, ResNet101, or MobileNetV2). For MattingRefine, configure the refinement mode (sampling for fixed computation, thresholding for quality-adaptive refinement, or full for debug). Load trained checkpoint weights and move the model to the target device (CPU or CUDA) in evaluation mode.
Key considerations:
- MattingRefine with sampling mode provides predictable computation per frame
- MattingRefine with thresholding mode adapts computation to image complexity
- Recommended settings: backbone_scale=0.25 and refine_sample_pixels=80000 for HD resolution
- The checkpoint must match the backbone architecture
Step 2: Image dataset loading
Load source images and background images as paired datasets using the ZipDataset loader. Source and background directories must contain the same number of images with matching filenames. Images are loaded as PIL images and converted to normalized tensors (0-1 range, RGB channels). Optionally apply homographic alignment to correct camera movement between source and background captures.
Key considerations:
- Source and background directories must have matching file structures
- Homographic alignment uses ORB feature detection and RANSAC-based homography
- Images are processed one at a time (batch size 1) to handle varying resolutions
- DataLoader handles image loading with optional multi-threaded workers
Step 3: Inference execution
Run the model in no-gradient mode on each source-background pair. The model produces alpha matte (pha), foreground (fgr), and additional intermediate outputs. For MattingRefine, the refiner selectively upsamples patches at error-prone regions to produce full-resolution output.
What happens:
- Source and background tensors are moved to GPU with non-blocking transfer
- Forward pass produces alpha (B,1,H,W) and foreground (B,3,H,W) at full resolution
- Error and refinement maps are optionally retained for visualization
- Composite is computed as: com = fgr * pha (with alpha channel appended)
Step 4: Output writing
Save the selected output types to disk. Each output type is written to its own subdirectory within the output directory. Writing is performed asynchronously using separate threads to overlap I/O with GPU computation. Alpha and foreground outputs are saved as JPEG; composites with transparency are saved as PNG.
Key considerations:
- Output filenames preserve the original source image directory structure
- Composite output includes RGBA channels for transparency support
- Error and refinement maps are upsampled to full resolution before saving
- Thread-based async writing overlaps disk I/O with model inference