Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Mlfoundations Open flamingo Eval datasets

From Leeroopedia


Template:Metadata

Overview

Concrete tool providing four PyTorch Dataset classes for loading vision-language evaluation benchmarks provided by the OpenFlamingo evaluation module.

Description

Four dataset classes:

  1. CaptionDataset — Loads COCO/Flickr30K with Karpathy split annotations, returns {image, caption, image_id}.
  2. VQADataset — Loads VQAv2/OK-VQA/VizWiz/TextVQA questions and annotations, returns {image, question, question_id, answers}.
  3. ImageNetDataset — Extends torchvision.ImageFolder, returns {id, image, class_id, class_name}.
  4. HatefulMemesDataset — Loads JSONL annotations with OCR text, returns {id, image, ocr, class_name, class_id}.

Usage

Used by evaluation functions to load benchmark data for few-shot evaluation.

Code Reference

Source: Repository https://github.com/mlfoundations/open_flamingo, File: open_flamingo/eval/eval_datasets.py Lines L1-157

Signatures:

class CaptionDataset(Dataset):
    def __init__(self, image_train_dir_path: str, annotations_path: str,
                 is_train: bool, dataset_name: str, image_val_dir_path: str = None):
        ...
    def __getitem__(self, idx) -> dict:  # {"image": PIL.Image, "caption": str, "image_id": int}

class VQADataset(Dataset):
    def __init__(self, image_dir_path: str, question_path: str,
                 annotations_path: str, is_train: bool, dataset_name: str):
        ...
    def __getitem__(self, idx) -> dict:  # {"image": PIL.Image, "question": str, "question_id": int, "answers": List[str]}

class ImageNetDataset(ImageFolder):
    def __init__(self, root: str, **kwargs):
        ...
    def __getitem__(self, idx) -> dict:  # {"id": int, "image": PIL.Image, "class_id": int, "class_name": str}

class HatefulMemesDataset(Dataset):
    def __init__(self, image_dir_path: str, annotations_path: str):
        ...
    def __getitem__(self, idx) -> dict:  # {"id": int, "image": PIL.Image, "ocr": str, "class_name": str, "class_id": int}

Import:

from open_flamingo.eval.eval_datasets import CaptionDataset, VQADataset, ImageNetDataset, HatefulMemesDataset

I/O Contract

Inputs

Dataset Constructor Parameters Description
CaptionDataset image_train_dir_path: str, annotations_path: str, is_train: bool, dataset_name: str, image_val_dir_path: str = None Path to training images directory, Karpathy split JSON annotations, train/test flag, dataset name (e.g. "coco" or "flickr"), optional validation images directory
VQADataset image_dir_path: str, question_path: str, annotations_path: str, is_train: bool, dataset_name: str Path to images directory, questions JSON, annotations JSON, train/test flag, dataset name (e.g. "vqav2", "ok_vqa", "vizwiz", "textvqa")
ImageNetDataset root: str, **kwargs Root directory of ImageNet dataset organized in class-folder structure
HatefulMemesDataset image_dir_path: str, annotations_path: str Path to images directory, JSONL annotations file

Outputs

Dataset __getitem__ Return Dict Field Types
CaptionDataset {"image", "caption", "image_id"} PIL.Image, str, int
VQADataset {"image", "question", "question_id", "answers"} PIL.Image, str, int, List[str]
ImageNetDataset {"id", "image", "class_id", "class_name"} int, PIL.Image, int, str
HatefulMemesDataset {"id", "image", "ocr", "class_name", "class_id"} int, PIL.Image, str, str, int

Usage Examples

Creating a CaptionDataset for COCO evaluation:

from open_flamingo.eval.eval_datasets import CaptionDataset

coco_dataset = CaptionDataset(
    image_train_dir_path="/data/coco/train2014",
    image_val_dir_path="/data/coco/val2014",
    annotations_path="/data/coco/karpathy_coco.json",
    is_train=False,
    dataset_name="coco",
)

sample = coco_dataset[0]
# sample["image"]    -> PIL.Image of the COCO validation image
# sample["caption"]  -> "A man riding a skateboard down a street."
# sample["image_id"] -> 139

Creating a VQADataset for VQAv2 evaluation:

from open_flamingo.eval.eval_datasets import VQADataset

vqa_dataset = VQADataset(
    image_dir_path="/data/coco/val2014",
    question_path="/data/vqav2/v2_OpenEnded_mscoco_val2014_questions.json",
    annotations_path="/data/vqav2/v2_mscoco_val2014_annotations.json",
    is_train=False,
    dataset_name="vqav2",
)

sample = vqa_dataset[0]
# sample["image"]       -> PIL.Image of the associated COCO image
# sample["question"]    -> "What color is the cat?"
# sample["question_id"] -> 262148000
# sample["answers"]     -> ["white", "white", "white and brown", ...]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment