Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Haotian liu LLaVA Evaluation Data Download

From Leeroopedia

Overview

Setup procedure for downloading and organizing LLaVA evaluation benchmark datasets into the standardized directory structure required by the evaluation pipeline.

Description

This is a Pattern Doc describing the data organization pattern for LLaVA evaluation. It is not a code API but rather a data setup procedure following the directory structure documented in docs/Evaluation.md. The pattern involves downloading a base archive (eval.zip) and then supplementing it with benchmark-specific image datasets and annotation files.

Source

docs/Evaluation.md:L1-167

Directory Structure

The complete evaluation data layout under ./playground/data/eval/:

playground/data/eval/
├── vqav2/                        # VQAv2 benchmark
│   ├── llava_vqav2_mscoco_test-dev2015.jsonl
│   ├── llava_vqav2_mscoco_test2015.jsonl
│   ├── test2015/                 # COCO test2015 images
│   ├── answers/                  # Model answer output directory
│   └── answers_upload/           # Formatted submission files
├── gqa/                          # GQA benchmark
│   └── data/                     # GQA data + evaluation scripts
├── textvqa/                      # TextVQA benchmark
│   ├── TextVQA_0.5.1_val.json
│   ├── llava_textvqa_val_v051_ocr.jsonl
│   └── train_val_images/         # TextVQA images
├── pope/                         # POPE hallucination evaluation
│   ├── coco_pope_adversarial.json
│   ├── coco_pope_popular.json
│   ├── coco_pope_random.json
│   └── llava_pope_test.jsonl
├── mme/                          # MME benchmark
│   ├── MME_Benchmark_release_version/
│   └── eval_tool/
├── mmbench/                      # MMBench benchmark
│   ├── mmbench_dev_20230712.tsv
│   ├── mmbench_dev_cn_20231003.tsv
│   └── answers_upload/
├── seed_bench/                   # SEED-Bench benchmark
│   ├── SEED-Bench-image/
│   ├── SEED-Bench-video-image/
│   └── llava-seed-bench.jsonl
├── llava-bench-in-the-wild/      # LLaVA-Bench qualitative eval
│   ├── questions.jsonl
│   ├── context.jsonl
│   ├── answers_gpt4.jsonl        # GPT-4 reference answers
│   ├── images/
│   └── reviews/
├── scienceqa/                    # ScienceQA benchmark
│   ├── images/
│   ├── pid_splits.json
│   ├── problems.json
│   └── llava_test_CQM-A.json
├── vizwiz/                       # VizWiz benchmark
│   ├── test.json
│   ├── llava_test.jsonl
│   ├── test/                     # VizWiz test images
│   └── answers_upload/
├── qbench/                       # Q-Bench benchmark
│   ├── llvisionqa_dev.json
│   ├── llvisionqa_test.json
│   └── images_llviqionqa/
└── mmvet/                        # MM-Vet benchmark
    ├── images/
    └── results/

Setup Steps

Step 1: Download Base Archive

Download eval.zip from Google Drive and extract to ./playground/data/eval/:

# Download eval.zip (contains annotations, scripts, LLaVA v1.5 predictions)
# URL: https://drive.google.com/file/d/1atZSBBrAX54yYpxtVVW33zFvcnaHeFPy/view
unzip eval.zip -d ./playground/data/eval/

Step 2: Download Benchmark-Specific Data

Benchmark Download Source Target Directory
VQAv2 http://images.cocodataset.org/zips/test2015.zip vqav2/test2015/
GQA https://cs.stanford.edu/people/dorarad/gqa/download.html gqa/data/
TextVQA https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip textvqa/train_val_images/
POPE https://github.com/AoiDragon/POPE/tree/main/output/coco pope/
MME https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models mme/
MMBench https://download.openmmlab.com/mmclassification/datasets/mmbench/ mmbench/
SEED-Bench https://github.com/AILab-CVC/SEED-Bench seed_bench/
LLaVA-Bench https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild llava-bench-in-the-wild/
ScienceQA https://github.com/lupantech/ScienceQA scienceqa/
VizWiz https://vizwiz.cs.colorado.edu/ vizwiz/
Q-Bench https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench qbench/
MM-Vet https://github.com/yuweihao/MM-Vet mmvet/

Inputs

  • Download URLs for each benchmark (listed in docs/Evaluation.md)
  • Google Drive link for the base eval.zip archive

Outputs

  • Organized evaluation data directories under ./playground/data/eval/
  • Question JSONL files ready for consumption by model_vqa_loader.py
  • Annotation files ready for metric computation scripts

Related Pages

Metadata

Property Value
last_updated 2026-02-13 14:00 GMT
page_type Implementation (Pattern Doc)
workflow Benchmark_Evaluation
source_file docs/Evaluation.md

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment