Implementation:OpenGVLab InternVL ScienceQA Prompt Conversion
| Knowledge Sources | |
|---|---|
| Domains | Data Preparation, Benchmark, ScienceQA |
| Last Updated | 2026-02-07 14:00 GMT |
Overview
This module provides prompt template construction functions for converting ScienceQA benchmark problems into LLaVA-compatible conversation formats.
Description
The convert_sqa_to_llava_base_prompt.py file is a core data preprocessing library used for ScienceQA dataset preparation. It provides helper functions to extract structured fields from ScienceQA problem dictionaries and assemble them into various prompt formats.
Field extraction functions:
- get_question_text: Extracts the question string
- get_context_text: Combines hint text and optional image caption
- get_choice_text: Formats multiple choice options as "(A) ..., (B) ..."
- get_answer: Maps answer index to letter option
- get_lecture_text / get_solution_text: Extracts lecture and solution with newline escaping for GPT-3 compatibility
Prompt assembly functions: Three variants (create_one_example, create_one_example_chatbot, create_one_example_gpt4) build prompts with configurable input-output format strings:
- Input formats: CQM (Context-Question-Multiple choice), QCM, QCML, QCME, QCMLE, and more
- Output formats: A (answer only), AL, AE, ALE, LA, EA, LEA, ELA, LEPA (with lecture, explanation, answer combinations)
Prompt builder functions: build_prompt creates n-shot text prompts, build_prompt_chatbot creates chatbot-style input/output pairs, and build_prompt_gpt4 creates OpenAI-compatible message arrays with system/user/assistant roles.
Usage
Use this module when preparing ScienceQA data for training or evaluating LLaVA models. It is imported by the main conversion script (convert_sqa_to_llava.py) that processes the full dataset.
Code Reference
Source Location
- Repository: OpenGVLab_InternVL
- File: internvl_chat_llava/scripts/convert_sqa_to_llava_base_prompt.py
- Lines: 1-334
Signature
def get_question_text(problem) -> str
def get_context_text(problem, use_caption) -> str
def get_choice_text(problem, options) -> str
def get_answer(problem, options) -> str
def get_lecture_text(problem) -> str
def get_solution_text(problem) -> str
def create_one_example_chatbot(format, question, context, choice, answer, lecture, solution, test_example=True) -> tuple
def create_one_example(format, question, context, choice, answer, lecture, solution, test_example=True) -> str
def create_one_example_gpt4(format, question, context, choice, answer, lecture, solution, test_example=True) -> tuple
def build_prompt_chatbot(problems, shot_qids, prompt_format, use_caption=False, options=["A","B","C","D","E"], is_test=False) -> dict
def build_prompt(problems, shot_qids, test_qid, args) -> str
def build_prompt_gpt4(problems, shot_qids, test_qid, args) -> list
Import
from convert_sqa_to_llava_base_prompt import (
build_prompt, build_prompt_chatbot, build_prompt_gpt4,
create_one_example, get_question_text, get_context_text
)
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| problems | dict | Yes | Dictionary mapping question IDs to ScienceQA problem dicts with keys: question, hint, caption, choices, answer, lecture, solution |
| shot_qids | list | Yes | List of question IDs to use as few-shot examples |
| prompt_format | str | Yes | Format string like "QCM-A" or "QCM-LEPA" specifying input-output structure |
| use_caption | bool | No | Whether to include image captions in context (default: False) |
| options | list | No | Answer option letters (default: ["A","B","C","D","E"]) |
Outputs
| Name | Type | Description |
|---|---|---|
| prompt | str | Formatted prompt string (from build_prompt) |
| examples | dict | Dictionary of (input, output) pairs keyed by question ID (from build_prompt_chatbot) |
| prompt_array | list | OpenAI-format message list with role/content dicts (from build_prompt_gpt4) |
Usage Examples
Basic Usage
from convert_sqa_to_llava_base_prompt import build_prompt_chatbot
# Build chatbot-style prompts for ScienceQA
problems = json.load(open("problems.json"))
examples = build_prompt_chatbot(
problems,
shot_qids=["1", "2", "3"],
prompt_format="QCM-LEPA",
use_caption=True
)
# examples["1"] -> (input_str, output_str)