Principle:Norrrrrrr lyn WAInjectBench LLaVA Training Data Preparation

Knowledge Sources	PyTorch Dataset
Domains	Data_Engineering, Computer_Vision, Deep_Learning
Last Updated	2026-02-14 16:00 GMT

Overview

A PyTorch Dataset abstraction that loads image-label pairs from JSONL files for vision-language model fine-tuning with DataLoader batching.

Description

For LLaVA fine-tuning, training data is loaded through a custom Dataset subclass that reads a JSONL file of {"path": str, "label": int} entries. Each __getitem__ call opens the image with PIL and returns a (PIL.Image, int) tuple. A custom collate_fn groups items into (List[PIL.Image], Tensor[long]) batches suitable for the LlavaYesnoToken model's forward() method.

This pattern differs from the embedding trainer's load_jsonl because it provides lazy loading (images are opened on access, not all at once) and integrates with PyTorch's DataLoader for batching, shuffling, and multi-worker loading.

Usage

Use this when preparing data for LLaVA fine-tuning. Both training and validation datasets use this class.

Theoretical Basis

# Lazy-loading Dataset pattern
class ImageLabelDataset(Dataset):
    def __init__(self, jsonl_path):
        self.items = parse_jsonl(jsonl_path)
    def __getitem__(self, i):
        return load_image(self.items[i]["path"]), self.items[i]["label"]
    def __len__(self):
        return len(self.items)

The lazy-loading approach avoids loading all images into memory at initialization, relying instead on the DataLoader to load them on-demand per batch.

Related Pages

Implemented By

Implementation:Norrrrrrr_lyn_WAInjectBench_JsonlImageDataset

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment