Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:PeterL1n BackgroundMattingV2 VideoDataset

From Leeroopedia


Knowledge Sources
Domains Data_Loading, Video_Processing
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for frame-by-frame video reading through PyTorch's Dataset interface provided by dataset/video.py.

Description

VideoDataset wraps OpenCV's cv2.VideoCapture to provide random-access frame reading from video files. Frames are converted from BGR to RGB color space and returned as PIL Images. The dataset exposes video metadata as attributes: width, height, frame_rate, frame_count. It supports sequential and random access, with automatic seek when the requested index differs from the current position. The class implements the context manager protocol (__enter__/'__exit__) for proper resource cleanup.

Usage

Use for loading source video files in video matting inference. Combine with a background source via ZipDataset and feed through a DataLoader.

Code Reference

Source Location

Signature

class VideoDataset(Dataset):
    def __init__(self, path: str, transforms: any = None):
        """
        Args:
            path: Path to video file
            transforms: Optional transform applied to each frame

        Attributes:
            width: int - Video frame width
            height: int - Video frame height
            frame_rate: float - Video FPS
            frame_count: int - Total number of frames
        """

    def __len__(self) -> int: ...
    def __getitem__(self, idx: int) -> PIL.Image: ...
    def __enter__(self) -> 'VideoDataset': ...
    def __exit__(self, exc_type, exc_value, exc_traceback) -> None: ...

Import

from dataset import VideoDataset

I/O Contract

Inputs

Name Type Required Description
path str Yes Path to video file (any format supported by OpenCV)
transforms callable No Transform applied to each frame (receives PIL.Image)

Outputs

Name Type Description
__getitem__ PIL.Image or Tensor Single video frame (RGB)
__len__ int Total frame count
.width int Frame width in pixels
.height int Frame height in pixels
.frame_rate float Video frames per second
.frame_count int Total number of frames

Usage Examples

Video Matting Inference

from dataset import VideoDataset, ZipDataset
from dataset.augmentation import PairCompose, PairApply
from torchvision import transforms as T
from torch.utils.data import DataLoader
from PIL import Image

# Load video and background
vid = VideoDataset('source_video.mp4')
bgr = [Image.open('background.jpg').convert('RGB')]

# Combine with background (single image repeated for all frames)
dataset = ZipDataset([vid, bgr], transforms=PairCompose([
    PairApply(T.ToTensor())
]))

# Process frame by frame
for src, bgr_tensor in DataLoader(dataset, batch_size=1):
    pha, fgr = model(src.cuda(), bgr_tensor.cuda())[:2]

print(f"Video: {vid.width}x{vid.height} @ {vid.frame_rate}fps, {vid.frame_count} frames")

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment