Implementation:Mlfoundations Open flamingo Eval datasets

Overview

Concrete tool providing four PyTorch Dataset classes for loading vision-language evaluation benchmarks provided by the OpenFlamingo evaluation module.

Description

Four dataset classes:

CaptionDataset — Loads COCO/Flickr30K with Karpathy split annotations, returns {image, caption, image_id}.
VQADataset — Loads VQAv2/OK-VQA/VizWiz/TextVQA questions and annotations, returns {image, question, question_id, answers}.
ImageNetDataset — Extends torchvision.ImageFolder, returns {id, image, class_id, class_name}.
HatefulMemesDataset — Loads JSONL annotations with OCR text, returns {id, image, ocr, class_name, class_id}.

Usage

Used by evaluation functions to load benchmark data for few-shot evaluation.

Code Reference

Source: Repository https://github.com/mlfoundations/open_flamingo, File: open_flamingo/eval/eval_datasets.py Lines L1-157

Signatures:

class CaptionDataset(Dataset):
    def __init__(self, image_train_dir_path: str, annotations_path: str,
                 is_train: bool, dataset_name: str, image_val_dir_path: str = None):
        ...
    def __getitem__(self, idx) -> dict:  # {"image": PIL.Image, "caption": str, "image_id": int}

class VQADataset(Dataset):
    def __init__(self, image_dir_path: str, question_path: str,
                 annotations_path: str, is_train: bool, dataset_name: str):
        ...
    def __getitem__(self, idx) -> dict:  # {"image": PIL.Image, "question": str, "question_id": int, "answers": List[str]}

class ImageNetDataset(ImageFolder):
    def __init__(self, root: str, **kwargs):
        ...
    def __getitem__(self, idx) -> dict:  # {"id": int, "image": PIL.Image, "class_id": int, "class_name": str}

class HatefulMemesDataset(Dataset):
    def __init__(self, image_dir_path: str, annotations_path: str):
        ...
    def __getitem__(self, idx) -> dict:  # {"id": int, "image": PIL.Image, "ocr": str, "class_name": str, "class_id": int}

Import:

from open_flamingo.eval.eval_datasets import CaptionDataset, VQADataset, ImageNetDataset, HatefulMemesDataset

I/O Contract

Inputs

Dataset	Constructor Parameters	Description
CaptionDataset	`image_train_dir_path: str, annotations_path: str, is_train: bool, dataset_name: str, image_val_dir_path: str = None`	Path to training images directory, Karpathy split JSON annotations, train/test flag, dataset name (e.g. `"coco"` or `"flickr"`), optional validation images directory
VQADataset	`image_dir_path: str, question_path: str, annotations_path: str, is_train: bool, dataset_name: str`	Path to images directory, questions JSON, annotations JSON, train/test flag, dataset name (e.g. `"vqav2"`, `"ok_vqa"`, `"vizwiz"`, `"textvqa"`)
ImageNetDataset	`root: str, **kwargs`	Root directory of ImageNet dataset organized in class-folder structure
HatefulMemesDataset	`image_dir_path: str, annotations_path: str`	Path to images directory, JSONL annotations file

Outputs

Dataset	`__getitem__` Return Dict	Field Types
CaptionDataset	`{"image", "caption", "image_id"}`	`PIL.Image, str, int`
VQADataset	`{"image", "question", "question_id", "answers"}`	`PIL.Image, str, int, List[str]`
ImageNetDataset	`{"id", "image", "class_id", "class_name"}`	`int, PIL.Image, int, str`
HatefulMemesDataset	`{"id", "image", "ocr", "class_name", "class_id"}`	`int, PIL.Image, str, str, int`

Usage Examples

Creating a CaptionDataset for COCO evaluation:

from open_flamingo.eval.eval_datasets import CaptionDataset

coco_dataset = CaptionDataset(
    image_train_dir_path="/data/coco/train2014",
    image_val_dir_path="/data/coco/val2014",
    annotations_path="/data/coco/karpathy_coco.json",
    is_train=False,
    dataset_name="coco",
)

sample = coco_dataset[0]
# sample["image"]    -> PIL.Image of the COCO validation image
# sample["caption"]  -> "A man riding a skateboard down a street."
# sample["image_id"] -> 139

Creating a VQADataset for VQAv2 evaluation:

from open_flamingo.eval.eval_datasets import VQADataset

vqa_dataset = VQADataset(
    image_dir_path="/data/coco/val2014",
    question_path="/data/vqav2/v2_OpenEnded_mscoco_val2014_questions.json",
    annotations_path="/data/vqav2/v2_mscoco_val2014_annotations.json",
    is_train=False,
    dataset_name="vqav2",
)

sample = vqa_dataset[0]
# sample["image"]       -> PIL.Image of the associated COCO image
# sample["question"]    -> "What color is the cat?"
# sample["question_id"] -> 262148000
# sample["answers"]     -> ["white", "white", "white and brown", ...]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment