Implementation:PeterL1n BackgroundMattingV2 VideoDataset

Knowledge Sources	BackgroundMattingV2
Domains	Data_Loading, Video_Processing
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for frame-by-frame video reading through PyTorch's Dataset interface provided by dataset/video.py.

Description

VideoDataset wraps OpenCV's cv2.VideoCapture to provide random-access frame reading from video files. Frames are converted from BGR to RGB color space and returned as PIL Images. The dataset exposes video metadata as attributes: width, height, frame_rate, frame_count. It supports sequential and random access, with automatic seek when the requested index differs from the current position. The class implements the context manager protocol (__enter__/'__exit__) for proper resource cleanup.

Usage

Use for loading source video files in video matting inference. Combine with a background source via ZipDataset and feed through a DataLoader.

Code Reference

Source Location

Repository: BackgroundMattingV2
File: dataset/video.py
Lines: 6-38

Signature

class VideoDataset(Dataset):
    def __init__(self, path: str, transforms: any = None):
        """
        Args:
            path: Path to video file
            transforms: Optional transform applied to each frame

        Attributes:
            width: int - Video frame width
            height: int - Video frame height
            frame_rate: float - Video FPS
            frame_count: int - Total number of frames
        """

    def __len__(self) -> int: ...
    def __getitem__(self, idx: int) -> PIL.Image: ...
    def __enter__(self) -> 'VideoDataset': ...
    def __exit__(self, exc_type, exc_value, exc_traceback) -> None: ...

Import

from dataset import VideoDataset

I/O Contract

Inputs

Name	Type	Required	Description
path	str	Yes	Path to video file (any format supported by OpenCV)
transforms	callable	No	Transform applied to each frame (receives PIL.Image)

Outputs

Name	Type	Description
__getitem__	PIL.Image or Tensor	Single video frame (RGB)
__len__	int	Total frame count
.width	int	Frame width in pixels
.height	int	Frame height in pixels
.frame_rate	float	Video frames per second
.frame_count	int	Total number of frames

Usage Examples

Video Matting Inference

from dataset import VideoDataset, ZipDataset
from dataset.augmentation import PairCompose, PairApply
from torchvision import transforms as T
from torch.utils.data import DataLoader
from PIL import Image

# Load video and background
vid = VideoDataset('source_video.mp4')
bgr = [Image.open('background.jpg').convert('RGB')]

# Combine with background (single image repeated for all frames)
dataset = ZipDataset([vid, bgr], transforms=PairCompose([
    PairApply(T.ToTensor())
]))

# Process frame by frame
for src, bgr_tensor in DataLoader(dataset, batch_size=1):
    pha, fgr = model(src.cuda(), bgr_tensor.cuda())[:2]

print(f"Video: {vid.width}x{vid.height} @ {vid.frame_rate}fps, {vid.frame_count} frames")

Related Pages

Implements Principle

Principle:PeterL1n_BackgroundMattingV2_Video_dataset_loading

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment