Implementation:OpenGVLab InternVL ScienceQA Inference
| Knowledge Sources | |
|---|---|
| Domains | Inference, Benchmark, Science_QA |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This script generates model predictions for the ScienceQA benchmark, handling its JSON-based question format with optional images and a two-pass answer prompting strategy.
Description
The model_vqa_science.py script implements the inference pipeline tailored for ScienceQA. Key differences from the generic VQA script include:
- JSON input format: Questions are loaded from a single JSON file (not JSONL) with a conversations-style format where the question is extracted from
line['conversations'][0]['value'] - Optional images: Not all ScienceQA questions have images; the script checks for the
'image'key and passesimages=Nonefor text-only questions - Single prediction prompt: When
--single-pred-promptis enabled, appends "Answer with the option's letter from the given choices directly" to focus the model on option selection - Answer prompter mode: When
--answer-prompteris enabled, a two-pass inference strategy is used: the first pass generates reasoning, then a second pass with the reasoning appended and "###\\nANSWER:" extracts just the answer letter. The final output combines both as "reasoning \\n The answer is X" - KeywordsStoppingCriteria: Used conditionally for v0 conversation templates
The script uses the standard split_list / get_chunk pattern for multi-GPU evaluation and writes JSONL output with question_id, prompt, text, answer_id, and model_id.
Usage
Use this script to generate predictions for ScienceQA evaluation. The output can then be processed by eval_science_qa.py to compute accuracy metrics.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat_llava/llava/eval/model_vqa_science.py
- Lines: 1-147
Signature
def split_list(lst: list, n: int) -> list: ...
def get_chunk(lst: list, n: int, k: int) -> list: ...
def eval_model(args: argparse.Namespace) -> None: ...
Import
from llava.eval.model_vqa_science import eval_model
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| --model-path | str | Yes | Path to the pretrained LLaVA model |
| --model-base | str | No | Base model path for LoRA or projector-only models |
| --image-folder | str | No | Root directory for image files |
| --question-file | str | No | Path to JSON question file (default: tables/question.json) |
| --answers-file | str | No | Path for output JSONL answers file (default: answer.jsonl) |
| --conv-mode | str | No | Conversation template name (default: llava_v0) |
| --num-chunks | int | No | Number of chunks for multi-GPU splitting (default: 1) |
| --chunk-idx | int | No | Index of the chunk to process (default: 0) |
| --temperature | float | No | Sampling temperature (default: 0.2) |
| --answer-prompter | flag | No | Enable two-pass inference with reasoning then answer extraction |
| --single-pred-prompt | flag | No | Append direct answer instruction to prompt |
Outputs
| Name | Type | Description |
|---|---|---|
| answers file | JSONL | Each line contains question_id, prompt, text, answer_id, model_id, and metadata |
Usage Examples
Basic Usage
# Command-line execution for ScienceQA inference
# python internvl_chat_llava/llava/eval/model_vqa_science.py \
# --model-path /path/to/llava-model \
# --image-folder /path/to/ScienceQA/images \
# --question-file ScienceQA/test.json \
# --answers-file sqa_predictions.jsonl \
# --single-pred-prompt \
# --conv-mode llava_v0