Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:PeterL1n BackgroundMattingV2 ZipDataset

From Leeroopedia


Knowledge Sources
Domains Data_Loading, Data_Augmentation
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for combining multiple PyTorch datasets into synchronized tuples provided by dataset/zip.py.

Description

ZipDataset wraps a list of datasets and returns tuples of corresponding elements. When datasets have different lengths, shorter ones cycle via modular indexing (idx % len(d)). The combined length is the maximum length across all datasets. An optional transforms function receives the entire tuple, enabling synchronized pair augmentations (e.g., applying the same random crop to both foreground and alpha).

Usage

Use to pair foreground+alpha datasets, combine foreground-alpha pairs with backgrounds, or pair video frames with static background images. Essential for the training data pipeline and inference dataset setup.

Code Reference

Source Location

Signature

class ZipDataset(Dataset):
    def __init__(
        self,
        datasets: List[Dataset],
        transforms: Optional[Callable] = None,
        assert_equal_length: bool = False
    ):
        """
        Args:
            datasets: List of datasets to zip together
            transforms: Optional transform applied to the tuple (*items)
            assert_equal_length: Raise error if datasets differ in length
        """

    def __len__(self) -> int:
        """Returns max length across all datasets."""

    def __getitem__(self, idx: int) -> Tuple:
        """Returns tuple of items, cycling shorter datasets."""

Import

from dataset import ZipDataset

I/O Contract

Inputs

Name Type Required Description
datasets List[Dataset] Yes Datasets to combine
transforms callable No Joint transform receiving unpacked tuple
assert_equal_length bool No Enforce equal dataset lengths (default False)

Outputs

Name Type Description
__getitem__ Tuple Tuple of elements from each dataset at index (with cycling)
__len__ int Maximum length across all datasets

Usage Examples

Pairing Foreground and Alpha

from dataset import ImagesDataset, ZipDataset
from dataset.augmentation import PairCompose, PairRandomAffineAndResize, PairApply
from torchvision import transforms as T

fgr = ImagesDataset('/data/train/fgr')
pha = ImagesDataset('/data/train/pha')

# Zip with synchronized augmentation
fgr_pha = ZipDataset(
    [fgr, pha],
    transforms=PairCompose([
        PairRandomAffineAndResize((512, 512)),
        PairApply(T.ToTensor()),
    ]),
    assert_equal_length=True
)

fgr_img, pha_img = fgr_pha[0]  # Synchronized pair

Pairing with Backgrounds (Different Length)

bgr = ImagesDataset('/data/backgrounds')

# Background dataset cycles if shorter than foreground/alpha
dataset = ZipDataset([fgr_pha, bgr])
(fgr_img, pha_img), bgr_img = dataset[0]

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment