Implementation:Haotian liu LLaVA Convert Results For Submission

Overview

CLI scripts for converting LLaVA evaluation outputs into benchmark-specific submission formats required by online evaluation servers and local scoring tools.

Description

Two primary converter scripts handle the most common format conversions:

convert_vqav2_for_submission.py - Reads the merged answer JSONL produced by multi-GPU inference, processes each answer through EvalAIAnswerProcessor for VQA-standard normalization (contraction expansion, number-to-word conversion, article removal, punctuation stripping), and creates a JSON array in the format required by the EvalAI evaluation server. Missing answers (questions not found in results) are filled with empty strings.

convert_mmbench_for_submission.py - Loads the original MMBench annotation TSV via pandas.read_table(), drops non-essential columns (hint, category, source, image, comment, l2-category), inserts a prediction column, populates it by matching question_id from the model's answer JSONL, and exports the result as an Excel file (.xlsx) using openpyxl engine.

Source

scripts/convert_vqav2_for_submission.py:L16-56
scripts/convert_mmbench_for_submission.py:L15-27
llava/eval/m4c_evaluator.py:L7-218 (EvalAIAnswerProcessor)

CLI Signatures

VQAv2 Format Conversion

python scripts/convert_vqav2_for_submission.py \
    --dir ./playground/data/eval/vqav2 \
    --ckpt llava-v1.5-13b \
    --split llava_vqav2_mscoco_test-dev2015

Argument	Type	Default	Description
`--dir`	str	`./playground/data/eval/vqav2`	Base directory for VQAv2 evaluation data
`--ckpt`	str	(required)	Model checkpoint name (used in path construction)
`--split`	str	(required)	Test split name (e.g., `llava_vqav2_mscoco_test-dev2015`)

File path resolution:

# Input: merged answer file from multi-GPU inference
src = os.path.join(args.dir, 'answers', args.split, args.ckpt, 'merge.jsonl')

# Reference: original question JSONL for question_id alignment
test_split = os.path.join(args.dir, 'llava_vqav2_mscoco_test2015.jsonl')

# Output: formatted JSON for EvalAI upload
dst = os.path.join(args.dir, 'answers_upload', args.split, f'{args.ckpt}.json')

MMBench Format Conversion

python scripts/convert_mmbench_for_submission.py \
    --annotation-file ./playground/data/eval/mmbench/mmbench_dev_20230712.tsv \
    --result-dir ./playground/data/eval/mmbench/answers \
    --upload-dir ./playground/data/eval/mmbench/answers_upload \
    --experiment llava-v1.5-13b

Argument	Type	Description
`--annotation-file`	str	Path to the MMBench annotation TSV file
`--result-dir`	str	Directory containing model answer JSONL files
`--upload-dir`	str	Output directory for the Excel submission file
`--experiment`	str	Experiment name (matches JSONL filename, used for output naming)

Inputs

VQAv2 Converter

Merged answer JSONL - Output from model_vqa_loader.py after chunk merging, with fields question_id and text
Test split JSONL - Original question file providing the canonical set of question IDs

MMBench Converter

Annotation TSV - Original MMBench annotation file with columns: index, question, A, B, C, D, hint, category, source, image, comment, l2-category
Answer JSONL - Model predictions with question_id and text fields

Outputs

VQAv2

JSON array file for EvalAI submission:

[
    {"question_id": 262148000, "answer": "yes"},
    {"question_id": 262148001, "answer": "2"},
    ...
]

MMBench

Excel spreadsheet (.xlsx) for OpenCompass submission, containing the original annotation columns plus a prediction column with model answers.

EvalAIAnswerProcessor Details

The EvalAIAnswerProcessor class (from m4c_evaluator.py) applies the standard VQA normalization:

answer_processor = EvalAIAnswerProcessor()
normalized = answer_processor("There are three cats")
# Result: "3 cats" (article removed, number word converted)

Processing steps:

word_tokenize() - Lowercase, strip commas/question marks
Replace newlines/tabs with spaces
process_punctuation() - Remove/replace 20+ punctuation characters
process_digit_article() - Map number words to digits, remove articles, expand contractions

Related Pages

implements Principle:Haotian_liu_LLaVA_Result_Format_Conversion

Metadata

Property	Value
last_updated	2026-02-13 14:00 GMT
page_type	Implementation (API Doc)
workflow	Benchmark_Evaluation
source_files	scripts/convert_vqav2_for_submission.py, scripts/convert_mmbench_for_submission.py

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment