Implementation:PeterL1n BackgroundMattingV2 ZipDataset
| Knowledge Sources | |
|---|---|
| Domains | Data_Loading, Data_Augmentation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Concrete tool for combining multiple PyTorch datasets into synchronized tuples provided by dataset/zip.py.
Description
ZipDataset wraps a list of datasets and returns tuples of corresponding elements. When datasets have different lengths, shorter ones cycle via modular indexing (idx % len(d)). The combined length is the maximum length across all datasets. An optional transforms function receives the entire tuple, enabling synchronized pair augmentations (e.g., applying the same random crop to both foreground and alpha).
Usage
Use to pair foreground+alpha datasets, combine foreground-alpha pairs with backgrounds, or pair video frames with static background images. Essential for the training data pipeline and inference dataset setup.
Code Reference
Source Location
- Repository: BackgroundMattingV2
- File: dataset/zip.py
- Lines: 4-20
Signature
class ZipDataset(Dataset):
def __init__(
self,
datasets: List[Dataset],
transforms: Optional[Callable] = None,
assert_equal_length: bool = False
):
"""
Args:
datasets: List of datasets to zip together
transforms: Optional transform applied to the tuple (*items)
assert_equal_length: Raise error if datasets differ in length
"""
def __len__(self) -> int:
"""Returns max length across all datasets."""
def __getitem__(self, idx: int) -> Tuple:
"""Returns tuple of items, cycling shorter datasets."""
Import
from dataset import ZipDataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| datasets | List[Dataset] | Yes | Datasets to combine |
| transforms | callable | No | Joint transform receiving unpacked tuple |
| assert_equal_length | bool | No | Enforce equal dataset lengths (default False) |
Outputs
| Name | Type | Description |
|---|---|---|
| __getitem__ | Tuple | Tuple of elements from each dataset at index (with cycling) |
| __len__ | int | Maximum length across all datasets |
Usage Examples
Pairing Foreground and Alpha
from dataset import ImagesDataset, ZipDataset
from dataset.augmentation import PairCompose, PairRandomAffineAndResize, PairApply
from torchvision import transforms as T
fgr = ImagesDataset('/data/train/fgr')
pha = ImagesDataset('/data/train/pha')
# Zip with synchronized augmentation
fgr_pha = ZipDataset(
[fgr, pha],
transforms=PairCompose([
PairRandomAffineAndResize((512, 512)),
PairApply(T.ToTensor()),
]),
assert_equal_length=True
)
fgr_img, pha_img = fgr_pha[0] # Synchronized pair
Pairing with Backgrounds (Different Length)
bgr = ImagesDataset('/data/backgrounds')
# Background dataset cycles if shorter than foreground/alpha
dataset = ZipDataset([fgr_pha, bgr])
(fgr_img, pha_img), bgr_img = dataset[0]