Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL ScienceQA Inference

From Leeroopedia


Knowledge Sources
Domains Inference, Benchmark, Science_QA
Last Updated 2026-02-07 14:00 GMT

Overview

This script generates model predictions for the ScienceQA benchmark, handling its JSON-based question format with optional images and a two-pass answer prompting strategy.

Description

The model_vqa_science.py script implements the inference pipeline tailored for ScienceQA. Key differences from the generic VQA script include:

  • JSON input format: Questions are loaded from a single JSON file (not JSONL) with a conversations-style format where the question is extracted from line['conversations'][0]['value']
  • Optional images: Not all ScienceQA questions have images; the script checks for the 'image' key and passes images=None for text-only questions
  • Single prediction prompt: When --single-pred-prompt is enabled, appends "Answer with the option's letter from the given choices directly" to focus the model on option selection
  • Answer prompter mode: When --answer-prompter is enabled, a two-pass inference strategy is used: the first pass generates reasoning, then a second pass with the reasoning appended and "###\\nANSWER:" extracts just the answer letter. The final output combines both as "reasoning \\n The answer is X"
  • KeywordsStoppingCriteria: Used conditionally for v0 conversation templates

The script uses the standard split_list / get_chunk pattern for multi-GPU evaluation and writes JSONL output with question_id, prompt, text, answer_id, and model_id.

Usage

Use this script to generate predictions for ScienceQA evaluation. The output can then be processed by eval_science_qa.py to compute accuracy metrics.

Code Reference

Source Location

Signature

def split_list(lst: list, n: int) -> list: ...

def get_chunk(lst: list, n: int, k: int) -> list: ...

def eval_model(args: argparse.Namespace) -> None: ...

Import

from llava.eval.model_vqa_science import eval_model

I/O Contract

Inputs

Name Type Required Description
--model-path str Yes Path to the pretrained LLaVA model
--model-base str No Base model path for LoRA or projector-only models
--image-folder str No Root directory for image files
--question-file str No Path to JSON question file (default: tables/question.json)
--answers-file str No Path for output JSONL answers file (default: answer.jsonl)
--conv-mode str No Conversation template name (default: llava_v0)
--num-chunks int No Number of chunks for multi-GPU splitting (default: 1)
--chunk-idx int No Index of the chunk to process (default: 0)
--temperature float No Sampling temperature (default: 0.2)
--answer-prompter flag No Enable two-pass inference with reasoning then answer extraction
--single-pred-prompt flag No Append direct answer instruction to prompt

Outputs

Name Type Description
answers file JSONL Each line contains question_id, prompt, text, answer_id, model_id, and metadata

Usage Examples

Basic Usage

# Command-line execution for ScienceQA inference
# python internvl_chat_llava/llava/eval/model_vqa_science.py \
#     --model-path /path/to/llava-model \
#     --image-folder /path/to/ScienceQA/images \
#     --question-file ScienceQA/test.json \
#     --answers-file sqa_predictions.jsonl \
#     --single-pred-prompt \
#     --conv-mode llava_v0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment