Implementation:PeterL1n BackgroundMattingV2 VideoDataset
| Knowledge Sources | |
|---|---|
| Domains | Data_Loading, Video_Processing |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for frame-by-frame video reading through PyTorch's Dataset interface provided by dataset/video.py.
Description
VideoDataset wraps OpenCV's cv2.VideoCapture to provide random-access frame reading from video files. Frames are converted from BGR to RGB color space and returned as PIL Images. The dataset exposes video metadata as attributes: width, height, frame_rate, frame_count. It supports sequential and random access, with automatic seek when the requested index differs from the current position. The class implements the context manager protocol (__enter__/'__exit__) for proper resource cleanup.
Usage
Use for loading source video files in video matting inference. Combine with a background source via ZipDataset and feed through a DataLoader.
Code Reference
Source Location
- Repository: BackgroundMattingV2
- File: dataset/video.py
- Lines: 6-38
Signature
class VideoDataset(Dataset):
def __init__(self, path: str, transforms: any = None):
"""
Args:
path: Path to video file
transforms: Optional transform applied to each frame
Attributes:
width: int - Video frame width
height: int - Video frame height
frame_rate: float - Video FPS
frame_count: int - Total number of frames
"""
def __len__(self) -> int: ...
def __getitem__(self, idx: int) -> PIL.Image: ...
def __enter__(self) -> 'VideoDataset': ...
def __exit__(self, exc_type, exc_value, exc_traceback) -> None: ...
Import
from dataset import VideoDataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| path | str | Yes | Path to video file (any format supported by OpenCV) |
| transforms | callable | No | Transform applied to each frame (receives PIL.Image) |
Outputs
| Name | Type | Description |
|---|---|---|
| __getitem__ | PIL.Image or Tensor | Single video frame (RGB) |
| __len__ | int | Total frame count |
| .width | int | Frame width in pixels |
| .height | int | Frame height in pixels |
| .frame_rate | float | Video frames per second |
| .frame_count | int | Total number of frames |
Usage Examples
Video Matting Inference
from dataset import VideoDataset, ZipDataset
from dataset.augmentation import PairCompose, PairApply
from torchvision import transforms as T
from torch.utils.data import DataLoader
from PIL import Image
# Load video and background
vid = VideoDataset('source_video.mp4')
bgr = [Image.open('background.jpg').convert('RGB')]
# Combine with background (single image repeated for all frames)
dataset = ZipDataset([vid, bgr], transforms=PairCompose([
PairApply(T.ToTensor())
]))
# Process frame by frame
for src, bgr_tensor in DataLoader(dataset, batch_size=1):
pha, fgr = model(src.cuda(), bgr_tensor.cuda())[:2]
print(f"Video: {vid.width}x{vid.height} @ {vid.frame_rate}fps, {vid.frame_count} frames")