Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL MMBench VQA Inference

From Leeroopedia


Knowledge Sources
Domains Inference, Benchmark, Multiple_Choice
Last Updated 2026-02-07 14:00 GMT

Overview

This script generates model predictions for the MMBench multiple-choice benchmark, handling its TSV/pandas data format with base64-encoded images and option rotation.

Description

The model_vqa_mmbench.py script implements the inference pipeline specifically tailored for the MMBench benchmark. It handles MMBench's unique data format:

  • TSV input: Questions are loaded via pd.read_table rather than JSONL, with columns for index, question, hint, image (base64-encoded), and option columns A-D
  • Base64 image decoding: Images are decoded from base64 strings using load_image_from_base64
  • Option construction: The get_options function extracts valid options (A-D) stopping at the first None/NaN value, and is_none handles various null representations

The inference loop supports multi-round evaluation with option rotation: when --all-rounds is enabled, the script cycles through all permutations of option ordering to assess position bias. Each round rotates both the option text and option letters.

Additional features include:

  • Single prediction prompt mode (--single-pred-prompt) that appends "Answer with the option's letter from the given choices directly"
  • Chinese language support via the --lang cn flag
  • Hint integration by prepending hint text to the question when available
  • Auto-detection of plain models for mmtag conversation mode switching

Usage

Use this script to generate predictions for MMBench submission. The output JSONL includes question_id, round_id, prompt, text, options, option_char, answer_id, and model_id.

Code Reference

Source Location

Signature

def split_list(lst: list, n: int) -> list: ...

def get_chunk(lst: list, n: int, k: int) -> list: ...

def is_none(value) -> bool: ...

def get_options(row, options: list) -> list: ...

def eval_model(args: argparse.Namespace) -> None: ...

Import

from llava.eval.model_vqa_mmbench import eval_model

I/O Contract

Inputs

Name Type Required Description
--model-path str Yes Path to the pretrained LLaVA model
--model-base str No Base model path for LoRA or projector-only models
--image-folder str No Root directory for image files (not used; images are base64 in TSV)
--question-file str No Path to TSV question file (default: tables/question.jsonl)
--answers-file str No Path for output JSONL answers file (default: answer.jsonl)
--conv-mode str No Conversation template name (default: llava_v1)
--num-chunks int No Number of chunks for multi-GPU splitting (default: 1)
--chunk-idx int No Index of the chunk to process (default: 0)
--temperature float No Sampling temperature (default: 0.2)
--top_p float No Top-p sampling parameter (default: None)
--num_beams int No Number of beams for beam search (default: 1)
--all-rounds flag No Enable multi-round evaluation with option rotation
--single-pred-prompt flag No Append direct answer instruction to prompt
--lang str No Language for instruction prompt (default: "en"; also supports "cn")

Outputs

Name Type Description
answers file JSONL Each line contains question_id, round_id, prompt, text, options, option_char, answer_id, model_id, and metadata

Usage Examples

Basic Usage

# Command-line execution for MMBench inference
# python internvl_chat_llava/llava/eval/model_vqa_mmbench.py \
#     --model-path /path/to/llava-model \
#     --question-file mmbench_test.tsv \
#     --answers-file mmbench_answers.jsonl \
#     --single-pred-prompt \
#     --temperature 0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment