Principle:Norrrrrrr lyn WAInjectBench LLaVA Training Data Preparation
| Knowledge Sources | |
|---|---|
| Domains | Data_Engineering, Computer_Vision, Deep_Learning |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A PyTorch Dataset abstraction that loads image-label pairs from JSONL files for vision-language model fine-tuning with DataLoader batching.
Description
For LLaVA fine-tuning, training data is loaded through a custom Dataset subclass that reads a JSONL file of {"path": str, "label": int} entries. Each __getitem__ call opens the image with PIL and returns a (PIL.Image, int) tuple. A custom collate_fn groups items into (List[PIL.Image], Tensor[long]) batches suitable for the LlavaYesnoToken model's forward() method.
This pattern differs from the embedding trainer's load_jsonl because it provides lazy loading (images are opened on access, not all at once) and integrates with PyTorch's DataLoader for batching, shuffling, and multi-worker loading.
Usage
Use this when preparing data for LLaVA fine-tuning. Both training and validation datasets use this class.
Theoretical Basis
# Lazy-loading Dataset pattern
class ImageLabelDataset(Dataset):
def __init__(self, jsonl_path):
self.items = parse_jsonl(jsonl_path)
def __getitem__(self, i):
return load_image(self.items[i]["path"]), self.items[i]["label"]
def __len__(self):
return len(self.items)
The lazy-loading approach avoids loading all images into memory at initialization, relying instead on the DataLoader to load them on-demand per batch.