Implementation:OpenGVLab InternVL MMBench VQA Inference
| Knowledge Sources | |
|---|---|
| Domains | Inference, Benchmark, Multiple_Choice |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This script generates model predictions for the MMBench multiple-choice benchmark, handling its TSV/pandas data format with base64-encoded images and option rotation.
Description
The model_vqa_mmbench.py script implements the inference pipeline specifically tailored for the MMBench benchmark. It handles MMBench's unique data format:
- TSV input: Questions are loaded via
pd.read_tablerather than JSONL, with columns for index, question, hint, image (base64-encoded), and option columns A-D - Base64 image decoding: Images are decoded from base64 strings using
load_image_from_base64 - Option construction: The
get_optionsfunction extracts valid options (A-D) stopping at the first None/NaN value, andis_nonehandles various null representations
The inference loop supports multi-round evaluation with option rotation: when --all-rounds is enabled, the script cycles through all permutations of option ordering to assess position bias. Each round rotates both the option text and option letters.
Additional features include:
- Single prediction prompt mode (
--single-pred-prompt) that appends "Answer with the option's letter from the given choices directly" - Chinese language support via the
--lang cnflag - Hint integration by prepending hint text to the question when available
- Auto-detection of plain models for mmtag conversation mode switching
Usage
Use this script to generate predictions for MMBench submission. The output JSONL includes question_id, round_id, prompt, text, options, option_char, answer_id, and model_id.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat_llava/llava/eval/model_vqa_mmbench.py
- Lines: 1-170
Signature
def split_list(lst: list, n: int) -> list: ...
def get_chunk(lst: list, n: int, k: int) -> list: ...
def is_none(value) -> bool: ...
def get_options(row, options: list) -> list: ...
def eval_model(args: argparse.Namespace) -> None: ...
Import
from llava.eval.model_vqa_mmbench import eval_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --model-path | str | Yes | Path to the pretrained LLaVA model |
| --model-base | str | No | Base model path for LoRA or projector-only models |
| --image-folder | str | No | Root directory for image files (not used; images are base64 in TSV) |
| --question-file | str | No | Path to TSV question file (default: tables/question.jsonl) |
| --answers-file | str | No | Path for output JSONL answers file (default: answer.jsonl) |
| --conv-mode | str | No | Conversation template name (default: llava_v1) |
| --num-chunks | int | No | Number of chunks for multi-GPU splitting (default: 1) |
| --chunk-idx | int | No | Index of the chunk to process (default: 0) |
| --temperature | float | No | Sampling temperature (default: 0.2) |
| --top_p | float | No | Top-p sampling parameter (default: None) |
| --num_beams | int | No | Number of beams for beam search (default: 1) |
| --all-rounds | flag | No | Enable multi-round evaluation with option rotation |
| --single-pred-prompt | flag | No | Append direct answer instruction to prompt |
| --lang | str | No | Language for instruction prompt (default: "en"; also supports "cn") |
Outputs
| Name | Type | Description |
|---|---|---|
| answers file | JSONL | Each line contains question_id, round_id, prompt, text, options, option_char, answer_id, model_id, and metadata |
Usage Examples
Basic Usage
# Command-line execution for MMBench inference
# python internvl_chat_llava/llava/eval/model_vqa_mmbench.py \
# --model-path /path/to/llava-model \
# --question-file mmbench_test.tsv \
# --answers-file mmbench_answers.jsonl \
# --single-pred-prompt \
# --temperature 0