Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Workflow:PeterL1n BackgroundMattingV2 Image matting inference

From Leeroopedia



Knowledge Sources
Domains Computer_Vision, Image_Matting, Inference
Last Updated 2026-02-09 02:30 GMT

Overview

Batch image matting pipeline that extracts alpha mattes, foreground layers, and composites from a directory of source images using a pre-captured background.

Description

This workflow performs background matting on a directory of still images. Given a set of source images (with a subject in front of a known background) and corresponding background images (the same scene without the subject), the model predicts per-pixel alpha mattes and foreground colors. These outputs can be used to composite the subject onto any new background.

The pipeline supports both the base model (MattingBase, coarse output only) and the refined model (MattingRefine, with selective high-resolution patch refinement). An optional homographic alignment preprocessing step can correct minor camera shifts between source and background captures using ORB feature matching and RANSAC homography estimation.

Output types include alpha matte (pha), foreground (fgr), composite with transparency (com), error map (err), and refinement region map (ref).

Usage

Execute this workflow when you have a directory of images taken against a known background and need to extract the subject with transparency. Typical use cases include product photography, portrait extraction, batch processing of photo shoots where a clean background plate was captured, and any scenario requiring high-quality alpha mattes from image pairs.

Execution Steps

Step 1: Model loading

Instantiate either MattingBase or MattingRefine with the chosen backbone architecture (ResNet50, ResNet101, or MobileNetV2). For MattingRefine, configure the refinement mode (sampling for fixed computation, thresholding for quality-adaptive refinement, or full for debug). Load trained checkpoint weights and move the model to the target device (CPU or CUDA) in evaluation mode.

Key considerations:

  • MattingRefine with sampling mode provides predictable computation per frame
  • MattingRefine with thresholding mode adapts computation to image complexity
  • Recommended settings: backbone_scale=0.25 and refine_sample_pixels=80000 for HD resolution
  • The checkpoint must match the backbone architecture

Step 2: Image dataset loading

Load source images and background images as paired datasets using the ZipDataset loader. Source and background directories must contain the same number of images with matching filenames. Images are loaded as PIL images and converted to normalized tensors (0-1 range, RGB channels). Optionally apply homographic alignment to correct camera movement between source and background captures.

Key considerations:

  • Source and background directories must have matching file structures
  • Homographic alignment uses ORB feature detection and RANSAC-based homography
  • Images are processed one at a time (batch size 1) to handle varying resolutions
  • DataLoader handles image loading with optional multi-threaded workers

Step 3: Inference execution

Run the model in no-gradient mode on each source-background pair. The model produces alpha matte (pha), foreground (fgr), and additional intermediate outputs. For MattingRefine, the refiner selectively upsamples patches at error-prone regions to produce full-resolution output.

What happens:

  • Source and background tensors are moved to GPU with non-blocking transfer
  • Forward pass produces alpha (B,1,H,W) and foreground (B,3,H,W) at full resolution
  • Error and refinement maps are optionally retained for visualization
  • Composite is computed as: com = fgr * pha (with alpha channel appended)

Step 4: Output writing

Save the selected output types to disk. Each output type is written to its own subdirectory within the output directory. Writing is performed asynchronously using separate threads to overlap I/O with GPU computation. Alpha and foreground outputs are saved as JPEG; composites with transparency are saved as PNG.

Key considerations:

  • Output filenames preserve the original source image directory structure
  • Composite output includes RGBA channels for transparency support
  • Error and refinement maps are upsampled to full resolution before saving
  • Thread-based async writing overlaps disk I/O with model inference

Execution Diagram

GitHub URL

Workflow Repository