Implementation:Haotian liu LLaVA Evaluation Data Download
Overview
Setup procedure for downloading and organizing LLaVA evaluation benchmark datasets into the standardized directory structure required by the evaluation pipeline.
Description
This is a Pattern Doc describing the data organization pattern for LLaVA evaluation. It is not a code API but rather a data setup procedure following the directory structure documented in docs/Evaluation.md. The pattern involves downloading a base archive (eval.zip) and then supplementing it with benchmark-specific image datasets and annotation files.
Source
docs/Evaluation.md:L1-167
Directory Structure
The complete evaluation data layout under ./playground/data/eval/:
playground/data/eval/
├── vqav2/ # VQAv2 benchmark
│ ├── llava_vqav2_mscoco_test-dev2015.jsonl
│ ├── llava_vqav2_mscoco_test2015.jsonl
│ ├── test2015/ # COCO test2015 images
│ ├── answers/ # Model answer output directory
│ └── answers_upload/ # Formatted submission files
├── gqa/ # GQA benchmark
│ └── data/ # GQA data + evaluation scripts
├── textvqa/ # TextVQA benchmark
│ ├── TextVQA_0.5.1_val.json
│ ├── llava_textvqa_val_v051_ocr.jsonl
│ └── train_val_images/ # TextVQA images
├── pope/ # POPE hallucination evaluation
│ ├── coco_pope_adversarial.json
│ ├── coco_pope_popular.json
│ ├── coco_pope_random.json
│ └── llava_pope_test.jsonl
├── mme/ # MME benchmark
│ ├── MME_Benchmark_release_version/
│ └── eval_tool/
├── mmbench/ # MMBench benchmark
│ ├── mmbench_dev_20230712.tsv
│ ├── mmbench_dev_cn_20231003.tsv
│ └── answers_upload/
├── seed_bench/ # SEED-Bench benchmark
│ ├── SEED-Bench-image/
│ ├── SEED-Bench-video-image/
│ └── llava-seed-bench.jsonl
├── llava-bench-in-the-wild/ # LLaVA-Bench qualitative eval
│ ├── questions.jsonl
│ ├── context.jsonl
│ ├── answers_gpt4.jsonl # GPT-4 reference answers
│ ├── images/
│ └── reviews/
├── scienceqa/ # ScienceQA benchmark
│ ├── images/
│ ├── pid_splits.json
│ ├── problems.json
│ └── llava_test_CQM-A.json
├── vizwiz/ # VizWiz benchmark
│ ├── test.json
│ ├── llava_test.jsonl
│ ├── test/ # VizWiz test images
│ └── answers_upload/
├── qbench/ # Q-Bench benchmark
│ ├── llvisionqa_dev.json
│ ├── llvisionqa_test.json
│ └── images_llviqionqa/
└── mmvet/ # MM-Vet benchmark
├── images/
└── results/
Setup Steps
Step 1: Download Base Archive
Download eval.zip from Google Drive and extract to ./playground/data/eval/:
# Download eval.zip (contains annotations, scripts, LLaVA v1.5 predictions)
# URL: https://drive.google.com/file/d/1atZSBBrAX54yYpxtVVW33zFvcnaHeFPy/view
unzip eval.zip -d ./playground/data/eval/
Step 2: Download Benchmark-Specific Data
| Benchmark | Download Source | Target Directory |
|---|---|---|
| VQAv2 | http://images.cocodataset.org/zips/test2015.zip | vqav2/test2015/
|
| GQA | https://cs.stanford.edu/people/dorarad/gqa/download.html | gqa/data/
|
| TextVQA | https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip | textvqa/train_val_images/
|
| POPE | https://github.com/AoiDragon/POPE/tree/main/output/coco | pope/
|
| MME | https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models | mme/
|
| MMBench | https://download.openmmlab.com/mmclassification/datasets/mmbench/ | mmbench/
|
| SEED-Bench | https://github.com/AILab-CVC/SEED-Bench | seed_bench/
|
| LLaVA-Bench | https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild | llava-bench-in-the-wild/
|
| ScienceQA | https://github.com/lupantech/ScienceQA | scienceqa/
|
| VizWiz | https://vizwiz.cs.colorado.edu/ | vizwiz/
|
| Q-Bench | https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench | qbench/
|
| MM-Vet | https://github.com/yuweihao/MM-Vet | mmvet/
|
Inputs
- Download URLs for each benchmark (listed in
docs/Evaluation.md) - Google Drive link for the base
eval.ziparchive
Outputs
- Organized evaluation data directories under
./playground/data/eval/ - Question JSONL files ready for consumption by
model_vqa_loader.py - Annotation files ready for metric computation scripts
Related Pages
Metadata
| Property | Value |
|---|---|
| last_updated | 2026-02-13 14:00 GMT |
| page_type | Implementation (Pattern Doc) |
| workflow | Benchmark_Evaluation |
| source_file | docs/Evaluation.md |