Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Haotian liu LLaVA Evaluation Data Download

From Leeroopedia
Revision as of 12:56, 16 February 2026 by Admin (talk | contribs) (Auto-imported from implementations/Haotian_liu_LLaVA_Evaluation_Data_Download.md)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Overview

Setup procedure for downloading and organizing LLaVA evaluation benchmark datasets into the standardized directory structure required by the evaluation pipeline.

Description

This is a Pattern Doc describing the data organization pattern for LLaVA evaluation. It is not a code API but rather a data setup procedure following the directory structure documented in docs/Evaluation.md. The pattern involves downloading a base archive (eval.zip) and then supplementing it with benchmark-specific image datasets and annotation files.

Source

docs/Evaluation.md:L1-167

Directory Structure

The complete evaluation data layout under ./playground/data/eval/:

playground/data/eval/
├── vqav2/                        # VQAv2 benchmark
│   ├── llava_vqav2_mscoco_test-dev2015.jsonl
│   ├── llava_vqav2_mscoco_test2015.jsonl
│   ├── test2015/                 # COCO test2015 images
│   ├── answers/                  # Model answer output directory
│   └── answers_upload/           # Formatted submission files
├── gqa/                          # GQA benchmark
│   └── data/                     # GQA data + evaluation scripts
├── textvqa/                      # TextVQA benchmark
│   ├── TextVQA_0.5.1_val.json
│   ├── llava_textvqa_val_v051_ocr.jsonl
│   └── train_val_images/         # TextVQA images
├── pope/                         # POPE hallucination evaluation
│   ├── coco_pope_adversarial.json
│   ├── coco_pope_popular.json
│   ├── coco_pope_random.json
│   └── llava_pope_test.jsonl
├── mme/                          # MME benchmark
│   ├── MME_Benchmark_release_version/
│   └── eval_tool/
├── mmbench/                      # MMBench benchmark
│   ├── mmbench_dev_20230712.tsv
│   ├── mmbench_dev_cn_20231003.tsv
│   └── answers_upload/
├── seed_bench/                   # SEED-Bench benchmark
│   ├── SEED-Bench-image/
│   ├── SEED-Bench-video-image/
│   └── llava-seed-bench.jsonl
├── llava-bench-in-the-wild/      # LLaVA-Bench qualitative eval
│   ├── questions.jsonl
│   ├── context.jsonl
│   ├── answers_gpt4.jsonl        # GPT-4 reference answers
│   ├── images/
│   └── reviews/
├── scienceqa/                    # ScienceQA benchmark
│   ├── images/
│   ├── pid_splits.json
│   ├── problems.json
│   └── llava_test_CQM-A.json
├── vizwiz/                       # VizWiz benchmark
│   ├── test.json
│   ├── llava_test.jsonl
│   ├── test/                     # VizWiz test images
│   └── answers_upload/
├── qbench/                       # Q-Bench benchmark
│   ├── llvisionqa_dev.json
│   ├── llvisionqa_test.json
│   └── images_llviqionqa/
└── mmvet/                        # MM-Vet benchmark
    ├── images/
    └── results/

Setup Steps

Step 1: Download Base Archive

Download eval.zip from Google Drive and extract to ./playground/data/eval/:

# Download eval.zip (contains annotations, scripts, LLaVA v1.5 predictions)
# URL: https://drive.google.com/file/d/1atZSBBrAX54yYpxtVVW33zFvcnaHeFPy/view
unzip eval.zip -d ./playground/data/eval/

Step 2: Download Benchmark-Specific Data

Benchmark Download Source Target Directory
VQAv2 http://images.cocodataset.org/zips/test2015.zip vqav2/test2015/
GQA https://cs.stanford.edu/people/dorarad/gqa/download.html gqa/data/
TextVQA https://dl.fbaipublicfiles.com/textvqa/images/train_val_images.zip textvqa/train_val_images/
POPE https://github.com/AoiDragon/POPE/tree/main/output/coco pope/
MME https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models mme/
MMBench https://download.openmmlab.com/mmclassification/datasets/mmbench/ mmbench/
SEED-Bench https://github.com/AILab-CVC/SEED-Bench seed_bench/
LLaVA-Bench https://huggingface.co/datasets/liuhaotian/llava-bench-in-the-wild llava-bench-in-the-wild/
ScienceQA https://github.com/lupantech/ScienceQA scienceqa/
VizWiz https://vizwiz.cs.colorado.edu/ vizwiz/
Q-Bench https://huggingface.co/datasets/nanyangtu/LLVisionQA-QBench qbench/
MM-Vet https://github.com/yuweihao/MM-Vet mmvet/

Inputs

  • Download URLs for each benchmark (listed in docs/Evaluation.md)
  • Google Drive link for the base eval.zip archive

Outputs

  • Organized evaluation data directories under ./playground/data/eval/
  • Question JSONL files ready for consumption by model_vqa_loader.py
  • Annotation files ready for metric computation scripts

Related Pages

Metadata

Property Value
last_updated 2026-02-13 14:00 GMT
page_type Implementation (Pattern Doc)
workflow Benchmark_Evaluation
source_file docs/Evaluation.md

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment