Implementation:Openai CLIP Dataset Preparation Wrapper
| Knowledge Sources | |
|---|---|
| Domains | Vision, Data_Engineering |
| Last Updated | 2026-02-13 22:00 GMT |
Overview
Wrapper documentation for using torchvision datasets and PyTorch DataLoader with CLIP preprocessing transforms.
Description
This wrapper documents how to use torchvision.datasets (e.g. CIFAR100, ImageNet) and torch.utils.data.DataLoader in the context of CLIP linear probe evaluation. The CLIP repository does not define its own dataset classes; instead, it relies on standard PyTorch data utilities with the CLIP preprocess transform injected as the dataset's transform parameter.
The pattern is demonstrated in the CLIP README (lines 141-191) using CIFAR-100 as the benchmark dataset.
External Reference
Usage
Use this wrapper whenever preparing a dataset for CLIP feature extraction. The key integration point is passing the preprocess transform from clip.load() to the dataset's transform parameter.
Code Reference
Source Location
- Repository: External (torchvision, PyTorch)
- Usage pattern: README.md (lines 141-191)
Signature
# torchvision dataset construction
torchvision.datasets.CIFAR100(
root: str,
train: bool = True,
transform: Optional[Callable] = None, # <- inject CLIP preprocess here
target_transform: Optional[Callable] = None,
download: bool = False
) -> Dataset
# PyTorch DataLoader
torch.utils.data.DataLoader(
dataset: Dataset,
batch_size: int = 1,
shuffle: bool = False,
num_workers: int = 0,
pin_memory: bool = False
) -> DataLoader
Import
from torchvision.datasets import CIFAR100
from torch.utils.data import DataLoader
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| root | str | Yes | Download/cache directory for the dataset (e.g. os.path.expanduser("~/.cache")) |
| train | bool | Yes | True for training split, False for test split |
| transform | Callable | Yes | The preprocess transform returned by clip.load() |
| download | bool | No | Whether to download if not already present. Default: False |
| batch_size | int | No | Number of samples per batch for DataLoader. Default: 1 |
| num_workers | int | No | Number of parallel data loading workers. Default: 0 |
Outputs
| Name | Type | Description |
|---|---|---|
| dataloader | DataLoader | Iterator yielding (images: torch.Tensor [B, 3, n_px, n_px], labels: torch.Tensor [B]) batches |
Usage Examples
CIFAR-100 for Linear Probe
import os
import clip
from torchvision.datasets import CIFAR100
from torch.utils.data import DataLoader
# Load model and get preprocessing transform
model, preprocess = clip.load("ViT-B/32", device="cuda")
# Create datasets with CLIP preprocessing
root = os.path.expanduser("~/.cache")
train_dataset = CIFAR100(root, download=True, train=True, transform=preprocess)
test_dataset = CIFAR100(root, download=True, train=False, transform=preprocess)
# Create dataloaders
train_loader = DataLoader(train_dataset, batch_size=100, num_workers=2)
test_loader = DataLoader(test_dataset, batch_size=100, num_workers=2)
# Iterate
for images, labels in train_loader:
images = images.to("cuda") # [100, 3, 224, 224]
# ... extract features