Implementation:Haotian liu LLaVA Evaluation Data Download

Overview

Setup procedure for downloading and organizing LLaVA evaluation benchmark datasets into the standardized directory structure required by the evaluation pipeline.

Description

This is a Pattern Doc describing the data organization pattern for LLaVA evaluation. It is not a code API but rather a data setup procedure following the directory structure documented in docs/Evaluation.md. The pattern involves downloading a base archive (eval.zip) and then supplementing it with benchmark-specific image datasets and annotation files.

Source

docs/Evaluation.md:L1-167

Directory Structure

The complete evaluation data layout under ./playground/data/eval/:

playground/data/eval/
├── vqav2/                        # VQAv2 benchmark
│   ├── llava_vqav2_mscoco_test-dev2015.jsonl
│   ├── llava_vqav2_mscoco_test2015.jsonl
│   ├── test2015/                 # COCO test2015 images
│   ├── answers/                  # Model answer output directory
│   └── answers_upload/           # Formatted submission files
├── gqa/                          # GQA benchmark
│   └── data/                     # GQA data + evaluation scripts
├── textvqa/                      # TextVQA benchmark
│   ├── TextVQA_0.5.1_val.json
│   ├── llava_textvqa_val_v051_ocr.jsonl
│   └── train_val_images/         # TextVQA images
├── pope/                         # POPE hallucination evaluation
│   ├── coco_pope_adversarial.json
│   ├── coco_pope_popular.json
│   ├── coco_pope_random.json
│   └── llava_pope_test.jsonl
├── mme/                          # MME benchmark
│   ├── MME_Benchmark_release_version/
│   └── eval_tool/
├── mmbench/                      # MMBench benchmark
│   ├── mmbench_dev_20230712.tsv
│   ├── mmbench_dev_cn_20231003.tsv
│   └── answers_upload/
├── seed_bench/                   # SEED-Bench benchmark
│   ├── SEED-Bench-image/
│   ├── SEED-Bench-video-image/
│   └── llava-seed-bench.jsonl
├── llava-bench-in-the-wild/      # LLaVA-Bench qualitative eval
│   ├── questions.jsonl
│   ├── context.jsonl
│   ├── answers_gpt4.jsonl        # GPT-4 reference answers
│   ├── images/
│   └── reviews/
├── scienceqa/                    # ScienceQA benchmark
│   ├── images/
│   ├── pid_splits.json
│   ├── problems.json
│   └── llava_test_CQM-A.json
├── vizwiz/                       # VizWiz benchmark
│   ├── test.json
│   ├── llava_test.jsonl
│   ├── test/                     # VizWiz test images
│   └── answers_upload/
├── qbench/                       # Q-Bench benchmark
│   ├── llvisionqa_dev.json
│   ├── llvisionqa_test.json
│   └── images_llviqionqa/
└── mmvet/                        # MM-Vet benchmark
    ├── images/
    └── results/

Setup Steps

Step 1: Download Base Archive

Download eval.zip from Google Drive and extract to ./playground/data/eval/:

# Download eval.zip (contains annotations, scripts, LLaVA v1.5 predictions)
# URL: https://drive.google.com/file/d/1atZSBBrAX54yYpxtVVW33zFvcnaHeFPy/view
unzip eval.zip -d ./playground/data/eval/

Step 2: Download Benchmark-Specific Data

Benchmark	Download Source	Target Directory
VQAv2	http://images.cocodataset.org/zips/test2015.zip	`vqav2/test2015/`
GQA	https://cs.stanford.edu/people/dorarad/gqa/download.html	`gqa/data/`
TextVQA	https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip	`textvqa/train_val_images/`
POPE	https://github.com/AoiDragon/POPE/tree/main/output/coco	`pope/`
MME	https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models	`mme/`
MMBench	https://download.openmmlab.com/mmclassification/datasets/mmbench/	`mmbench/`
SEED-Bench	https://github.com/AILab-CVC/SEED-Bench	`seed_bench/`
LLaVA-Bench	https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild	`llava-bench-in-the-wild/`
ScienceQA	https://github.com/lupantech/ScienceQA	`scienceqa/`
VizWiz	https://vizwiz.cs.colorado.edu/	`vizwiz/`
Q-Bench	https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench	`qbench/`
MM-Vet	https://github.com/yuweihao/MM-Vet	`mmvet/`

Inputs

Download URLs for each benchmark (listed in docs/Evaluation.md)
Google Drive link for the base eval.zip archive

Outputs

Organized evaluation data directories under ./playground/data/eval/
Question JSONL files ready for consumption by model_vqa_loader.py
Annotation files ready for metric computation scripts

Related Pages

implements Principle:Haotian_liu_LLaVA_Evaluation_Data_Setup

Metadata

Property	Value
last_updated	2026-02-13 14:00 GMT
page_type	Implementation (Pattern Doc)
workflow	Benchmark_Evaluation
source_file	docs/Evaluation.md

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment