Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Haotian liu LLaVA Convert Results For Submission

From Leeroopedia

Overview

CLI scripts for converting LLaVA evaluation outputs into benchmark-specific submission formats required by online evaluation servers and local scoring tools.

Description

Two primary converter scripts handle the most common format conversions:

  • convert_vqav2_for_submission.py - Reads the merged answer JSONL produced by multi-GPU inference, processes each answer through EvalAIAnswerProcessor for VQA-standard normalization (contraction expansion, number-to-word conversion, article removal, punctuation stripping), and creates a JSON array in the format required by the EvalAI evaluation server. Missing answers (questions not found in results) are filled with empty strings.
  • convert_mmbench_for_submission.py - Loads the original MMBench annotation TSV via pandas.read_table(), drops non-essential columns (hint, category, source, image, comment, l2-category), inserts a prediction column, populates it by matching question_id from the model's answer JSONL, and exports the result as an Excel file (.xlsx) using openpyxl engine.

Source

  • scripts/convert_vqav2_for_submission.py:L16-56
  • scripts/convert_mmbench_for_submission.py:L15-27
  • llava/eval/m4c_evaluator.py:L7-218 (EvalAIAnswerProcessor)

CLI Signatures

VQAv2 Format Conversion

python scripts/convert_vqav2_for_submission.py \
    --dir ./playground/data/eval/vqav2 \
    --ckpt llava-v1.5-13b \
    --split llava_vqav2_mscoco_test-dev2015
Argument Type Default Description
--dir str ./playground/data/eval/vqav2 Base directory for VQAv2 evaluation data
--ckpt str (required) Model checkpoint name (used in path construction)
--split str (required) Test split name (e.g., llava_vqav2_mscoco_test-dev2015)

File path resolution:

# Input: merged answer file from multi-GPU inference
src = os.path.join(args.dir, 'answers', args.split, args.ckpt, 'merge.jsonl')

# Reference: original question JSONL for question_id alignment
test_split = os.path.join(args.dir, 'llava_vqav2_mscoco_test2015.jsonl')

# Output: formatted JSON for EvalAI upload
dst = os.path.join(args.dir, 'answers_upload', args.split, f'{args.ckpt}.json')

MMBench Format Conversion

python scripts/convert_mmbench_for_submission.py \
    --annotation-file ./playground/data/eval/mmbench/mmbench_dev_20230712.tsv \
    --result-dir ./playground/data/eval/mmbench/answers \
    --upload-dir ./playground/data/eval/mmbench/answers_upload \
    --experiment llava-v1.5-13b
Argument Type Description
--annotation-file str Path to the MMBench annotation TSV file
--result-dir str Directory containing model answer JSONL files
--upload-dir str Output directory for the Excel submission file
--experiment str Experiment name (matches JSONL filename, used for output naming)

Inputs

VQAv2 Converter

  • Merged answer JSONL - Output from model_vqa_loader.py after chunk merging, with fields question_id and text
  • Test split JSONL - Original question file providing the canonical set of question IDs

MMBench Converter

  • Annotation TSV - Original MMBench annotation file with columns: index, question, A, B, C, D, hint, category, source, image, comment, l2-category
  • Answer JSONL - Model predictions with question_id and text fields

Outputs

VQAv2

JSON array file for EvalAI submission:

[
    {"question_id": 262148000, "answer": "yes"},
    {"question_id": 262148001, "answer": "2"},
    ...
]

MMBench

Excel spreadsheet (.xlsx) for OpenCompass submission, containing the original annotation columns plus a prediction column with model answers.

EvalAIAnswerProcessor Details

The EvalAIAnswerProcessor class (from m4c_evaluator.py) applies the standard VQA normalization:

answer_processor = EvalAIAnswerProcessor()
normalized = answer_processor("There are three cats")
# Result: "3 cats" (article removed, number word converted)

Processing steps:

  1. word_tokenize() - Lowercase, strip commas/question marks
  2. Replace newlines/tabs with spaces
  3. process_punctuation() - Remove/replace 20+ punctuation characters
  4. process_digit_article() - Map number words to digits, remove articles, expand contractions

Related Pages

Metadata

Property Value
last_updated 2026-02-13 14:00 GMT
page_type Implementation (API Doc)
workflow Benchmark_Evaluation
source_files scripts/convert_vqav2_for_submission.py, scripts/convert_mmbench_for_submission.py

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment