Principle:Haotian liu LLaVA Result Format Conversion
Overview
Process for converting model output files into benchmark-specific submission formats required by evaluation servers and local scoring tools.
Description
Different benchmarks require different submission formats, and LLaVA provides converter scripts that transform raw answer JSONL files into these required formats. The conversion pipeline bridges the gap between LLaVA's uniform answer JSONL output and the diverse format requirements of external evaluation platforms.
The primary converters include:
- VQAv2 converter (
convert_vqav2_for_submission.py) - Reads merged answer JSONL, applies VQA answer normalization viaEvalAIAnswerProcessor, and produces a JSON array formatted for EvalAI submission. The processor normalizes answers by expanding contractions, converting number words to digits, removing articles, stripping punctuation, and collapsing whitespace.
- MMBench converter (
convert_mmbench_for_submission.py) - Joins model answer predictions with the original annotation TSV file and produces an Excel spreadsheet (.xlsx) for submission to the MMBench evaluation server at OpenCompass.
- VizWiz converter (
convert_vizwiz_for_submission.py) - Formats answers for VizWiz EvalAI submission.
- GQA converter (
convert_gqa_for_eval.py) - Prepares answers for GQA local evaluation.
- SEED converter (
convert_seed_for_submission.py) - Formats predictions for SEED-Bench leaderboard.
- MM-Vet converter (
convert_mmvet_for_eval.py) - Formats answers for MM-Vet evaluation notebook.
Usage
Use these converters after running batch VQA inference to prepare results for:
- Online submission - EvalAI (VQAv2, VizWiz), OpenCompass (MMBench), SEED-Bench leaderboard
- Local evaluation - GQA eval scripts, MM-Vet Jupyter notebooks
The converters are typically invoked automatically at the end of evaluation shell scripts (e.g., vqav2.sh calls convert_vqav2_for_submission.py after merging chunk files).
Theoretical Basis
VQA Answer Normalization
The VQA answer normalization follows the standard VQA evaluation protocol implemented in the EvalAIAnswerProcessor class (from m4c_evaluator.py). The normalization pipeline applies these transformations in order:
- Word tokenization - Lowercase, remove commas and question marks, separate possessives
- Whitespace normalization - Replace newlines and tabs with spaces
- Punctuation processing - Remove or replace punctuation characters (
; / [ ] " { } ( ) = + \ _ - > < @ ` , ? !) - Digit/article processing - Convert number words to digits (e.g., "three" to "3"), remove articles ("a", "an", "the"), expand contractions
- Final cleanup - Strip leading/trailing whitespace
This normalization ensures fair comparison across models by removing superficial formatting differences in answers.
Format Requirements
| Benchmark | Submission Target | Required Format |
|---|---|---|
| VQAv2 | EvalAI server | JSON array of {"question_id", "answer"}
|
| MMBench | OpenCompass | Excel spreadsheet with prediction column |
| VizWiz | EvalAI server | JSON with normalized answers |
| GQA | Local eval.py | Reformatted answer file |
| SEED-Bench | Leaderboard | JSONL with predictions |
Knowledge Sources
- Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains
- Evaluation
- Data_Processing
Related Pages
Metadata
| Property | Value |
|---|---|
| last_updated | 2026-02-13 14:00 GMT |
| page_type | Principle |
| workflow | Benchmark_Evaluation |