Implementation:Mlfoundations Open flamingo Eval datasets
Appearance
Overview
Concrete tool providing four PyTorch Dataset classes for loading vision-language evaluation benchmarks provided by the OpenFlamingo evaluation module.
Description
Four dataset classes:
- CaptionDataset — Loads COCO/Flickr30K with Karpathy split annotations, returns
{image, caption, image_id}. - VQADataset — Loads VQAv2/OK-VQA/VizWiz/TextVQA questions and annotations, returns
{image, question, question_id, answers}. - ImageNetDataset — Extends
torchvision.ImageFolder, returns{id, image, class_id, class_name}. - HatefulMemesDataset — Loads JSONL annotations with OCR text, returns
{id, image, ocr, class_name, class_id}.
Usage
Used by evaluation functions to load benchmark data for few-shot evaluation.
Code Reference
Source: Repository https://github.com/mlfoundations/open_flamingo, File: open_flamingo/eval/eval_datasets.py Lines L1-157
Signatures:
class CaptionDataset(Dataset):
def __init__(self, image_train_dir_path: str, annotations_path: str,
is_train: bool, dataset_name: str, image_val_dir_path: str = None):
...
def __getitem__(self, idx) -> dict: # {"image": PIL.Image, "caption": str, "image_id": int}
class VQADataset(Dataset):
def __init__(self, image_dir_path: str, question_path: str,
annotations_path: str, is_train: bool, dataset_name: str):
...
def __getitem__(self, idx) -> dict: # {"image": PIL.Image, "question": str, "question_id": int, "answers": List[str]}
class ImageNetDataset(ImageFolder):
def __init__(self, root: str, **kwargs):
...
def __getitem__(self, idx) -> dict: # {"id": int, "image": PIL.Image, "class_id": int, "class_name": str}
class HatefulMemesDataset(Dataset):
def __init__(self, image_dir_path: str, annotations_path: str):
...
def __getitem__(self, idx) -> dict: # {"id": int, "image": PIL.Image, "ocr": str, "class_name": str, "class_id": int}
Import:
from open_flamingo.eval.eval_datasets import CaptionDataset, VQADataset, ImageNetDataset, HatefulMemesDataset
I/O Contract
Inputs
| Dataset | Constructor Parameters | Description |
|---|---|---|
| CaptionDataset | image_train_dir_path: str, annotations_path: str, is_train: bool, dataset_name: str, image_val_dir_path: str = None |
Path to training images directory, Karpathy split JSON annotations, train/test flag, dataset name (e.g. "coco" or "flickr"), optional validation images directory
|
| VQADataset | image_dir_path: str, question_path: str, annotations_path: str, is_train: bool, dataset_name: str |
Path to images directory, questions JSON, annotations JSON, train/test flag, dataset name (e.g. "vqav2", "ok_vqa", "vizwiz", "textvqa")
|
| ImageNetDataset | root: str, **kwargs |
Root directory of ImageNet dataset organized in class-folder structure |
| HatefulMemesDataset | image_dir_path: str, annotations_path: str |
Path to images directory, JSONL annotations file |
Outputs
| Dataset | __getitem__ Return Dict |
Field Types |
|---|---|---|
| CaptionDataset | {"image", "caption", "image_id"} |
PIL.Image, str, int
|
| VQADataset | {"image", "question", "question_id", "answers"} |
PIL.Image, str, int, List[str]
|
| ImageNetDataset | {"id", "image", "class_id", "class_name"} |
int, PIL.Image, int, str
|
| HatefulMemesDataset | {"id", "image", "ocr", "class_name", "class_id"} |
int, PIL.Image, str, str, int
|
Usage Examples
Creating a CaptionDataset for COCO evaluation:
from open_flamingo.eval.eval_datasets import CaptionDataset
coco_dataset = CaptionDataset(
image_train_dir_path="/data/coco/train2014",
image_val_dir_path="/data/coco/val2014",
annotations_path="/data/coco/karpathy_coco.json",
is_train=False,
dataset_name="coco",
)
sample = coco_dataset[0]
# sample["image"] -> PIL.Image of the COCO validation image
# sample["caption"] -> "A man riding a skateboard down a street."
# sample["image_id"] -> 139
Creating a VQADataset for VQAv2 evaluation:
from open_flamingo.eval.eval_datasets import VQADataset
vqa_dataset = VQADataset(
image_dir_path="/data/coco/val2014",
question_path="/data/vqav2/v2_OpenEnded_mscoco_val2014_questions.json",
annotations_path="/data/vqav2/v2_mscoco_val2014_annotations.json",
is_train=False,
dataset_name="vqav2",
)
sample = vqa_dataset[0]
# sample["image"] -> PIL.Image of the associated COCO image
# sample["question"] -> "What color is the cat?"
# sample["question_id"] -> 262148000
# sample["answers"] -> ["white", "white", "white and brown", ...]
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment