Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL ScienceQA Prompt Conversion

From Leeroopedia


Knowledge Sources
Domains Data Preparation, Benchmark, ScienceQA
Last Updated 2026-02-07 14:00 GMT

Overview

This module provides prompt template construction functions for converting ScienceQA benchmark problems into LLaVA-compatible conversation formats.

Description

The convert_sqa_to_llava_base_prompt.py file is a core data preprocessing library used for ScienceQA dataset preparation. It provides helper functions to extract structured fields from ScienceQA problem dictionaries and assemble them into various prompt formats.

Field extraction functions:

  • get_question_text: Extracts the question string
  • get_context_text: Combines hint text and optional image caption
  • get_choice_text: Formats multiple choice options as "(A) ..., (B) ..."
  • get_answer: Maps answer index to letter option
  • get_lecture_text / get_solution_text: Extracts lecture and solution with newline escaping for GPT-3 compatibility

Prompt assembly functions: Three variants (create_one_example, create_one_example_chatbot, create_one_example_gpt4) build prompts with configurable input-output format strings:

  • Input formats: CQM (Context-Question-Multiple choice), QCM, QCML, QCME, QCMLE, and more
  • Output formats: A (answer only), AL, AE, ALE, LA, EA, LEA, ELA, LEPA (with lecture, explanation, answer combinations)

Prompt builder functions: build_prompt creates n-shot text prompts, build_prompt_chatbot creates chatbot-style input/output pairs, and build_prompt_gpt4 creates OpenAI-compatible message arrays with system/user/assistant roles.

Usage

Use this module when preparing ScienceQA data for training or evaluating LLaVA models. It is imported by the main conversion script (convert_sqa_to_llava.py) that processes the full dataset.

Code Reference

Source Location

Signature

def get_question_text(problem) -> str
def get_context_text(problem, use_caption) -> str
def get_choice_text(problem, options) -> str
def get_answer(problem, options) -> str
def get_lecture_text(problem) -> str
def get_solution_text(problem) -> str

def create_one_example_chatbot(format, question, context, choice, answer, lecture, solution, test_example=True) -> tuple
def create_one_example(format, question, context, choice, answer, lecture, solution, test_example=True) -> str
def create_one_example_gpt4(format, question, context, choice, answer, lecture, solution, test_example=True) -> tuple

def build_prompt_chatbot(problems, shot_qids, prompt_format, use_caption=False, options=["A","B","C","D","E"], is_test=False) -> dict
def build_prompt(problems, shot_qids, test_qid, args) -> str
def build_prompt_gpt4(problems, shot_qids, test_qid, args) -> list

Import

from convert_sqa_to_llava_base_prompt import (
    build_prompt, build_prompt_chatbot, build_prompt_gpt4,
    create_one_example, get_question_text, get_context_text
)

I/O Contract

Inputs

Name Type Required Description
problems dict Yes Dictionary mapping question IDs to ScienceQA problem dicts with keys: question, hint, caption, choices, answer, lecture, solution
shot_qids list Yes List of question IDs to use as few-shot examples
prompt_format str Yes Format string like "QCM-A" or "QCM-LEPA" specifying input-output structure
use_caption bool No Whether to include image captions in context (default: False)
options list No Answer option letters (default: ["A","B","C","D","E"])

Outputs

Name Type Description
prompt str Formatted prompt string (from build_prompt)
examples dict Dictionary of (input, output) pairs keyed by question ID (from build_prompt_chatbot)
prompt_array list OpenAI-format message list with role/content dicts (from build_prompt_gpt4)

Usage Examples

Basic Usage

from convert_sqa_to_llava_base_prompt import build_prompt_chatbot

# Build chatbot-style prompts for ScienceQA
problems = json.load(open("problems.json"))
examples = build_prompt_chatbot(
    problems,
    shot_qids=["1", "2", "3"],
    prompt_format="QCM-LEPA",
    use_caption=True
)
# examples["1"] -> (input_str, output_str)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment