Implementation:Haotian liu LLaVA Convert Results For Submission
Overview
CLI scripts for converting LLaVA evaluation outputs into benchmark-specific submission formats required by online evaluation servers and local scoring tools.
Description
Two primary converter scripts handle the most common format conversions:
- convert_vqav2_for_submission.py - Reads the merged answer JSONL produced by multi-GPU inference, processes each answer through
EvalAIAnswerProcessorfor VQA-standard normalization (contraction expansion, number-to-word conversion, article removal, punctuation stripping), and creates a JSON array in the format required by the EvalAI evaluation server. Missing answers (questions not found in results) are filled with empty strings.
- convert_mmbench_for_submission.py - Loads the original MMBench annotation TSV via
pandas.read_table(), drops non-essential columns (hint, category, source, image, comment, l2-category), inserts apredictioncolumn, populates it by matchingquestion_idfrom the model's answer JSONL, and exports the result as an Excel file (.xlsx) usingopenpyxlengine.
Source
scripts/convert_vqav2_for_submission.py:L16-56scripts/convert_mmbench_for_submission.py:L15-27llava/eval/m4c_evaluator.py:L7-218(EvalAIAnswerProcessor)
CLI Signatures
VQAv2 Format Conversion
python scripts/convert_vqav2_for_submission.py \
--dir ./playground/data/eval/vqav2 \
--ckpt llava-v1.5-13b \
--split llava_vqav2_mscoco_test-dev2015
| Argument | Type | Default | Description |
|---|---|---|---|
--dir |
str | ./playground/data/eval/vqav2 |
Base directory for VQAv2 evaluation data |
--ckpt |
str | (required) | Model checkpoint name (used in path construction) |
--split |
str | (required) | Test split name (e.g., llava_vqav2_mscoco_test-dev2015)
|
File path resolution:
# Input: merged answer file from multi-GPU inference
src = os.path.join(args.dir, 'answers', args.split, args.ckpt, 'merge.jsonl')
# Reference: original question JSONL for question_id alignment
test_split = os.path.join(args.dir, 'llava_vqav2_mscoco_test2015.jsonl')
# Output: formatted JSON for EvalAI upload
dst = os.path.join(args.dir, 'answers_upload', args.split, f'{args.ckpt}.json')
MMBench Format Conversion
python scripts/convert_mmbench_for_submission.py \
--annotation-file ./playground/data/eval/mmbench/mmbench_dev_20230712.tsv \
--result-dir ./playground/data/eval/mmbench/answers \
--upload-dir ./playground/data/eval/mmbench/answers_upload \
--experiment llava-v1.5-13b
| Argument | Type | Description |
|---|---|---|
--annotation-file |
str | Path to the MMBench annotation TSV file |
--result-dir |
str | Directory containing model answer JSONL files |
--upload-dir |
str | Output directory for the Excel submission file |
--experiment |
str | Experiment name (matches JSONL filename, used for output naming) |
Inputs
VQAv2 Converter
- Merged answer JSONL - Output from
model_vqa_loader.pyafter chunk merging, with fieldsquestion_idandtext - Test split JSONL - Original question file providing the canonical set of question IDs
MMBench Converter
- Annotation TSV - Original MMBench annotation file with columns: index, question, A, B, C, D, hint, category, source, image, comment, l2-category
- Answer JSONL - Model predictions with
question_idandtextfields
Outputs
VQAv2
JSON array file for EvalAI submission:
[
{"question_id": 262148000, "answer": "yes"},
{"question_id": 262148001, "answer": "2"},
...
]
MMBench
Excel spreadsheet (.xlsx) for OpenCompass submission, containing the original annotation columns plus a prediction column with model answers.
EvalAIAnswerProcessor Details
The EvalAIAnswerProcessor class (from m4c_evaluator.py) applies the standard VQA normalization:
answer_processor = EvalAIAnswerProcessor()
normalized = answer_processor("There are three cats")
# Result: "3 cats" (article removed, number word converted)
Processing steps:
word_tokenize()- Lowercase, strip commas/question marks- Replace newlines/tabs with spaces
process_punctuation()- Remove/replace 20+ punctuation charactersprocess_digit_article()- Map number words to digits, remove articles, expand contractions
Related Pages
Metadata
| Property | Value |
|---|---|
| last_updated | 2026-02-13 14:00 GMT |
| page_type | Implementation (API Doc) |
| workflow | Benchmark_Evaluation |
| source_files | scripts/convert_vqav2_for_submission.py, scripts/convert_mmbench_for_submission.py |