Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:OpenGVLab InternVL VQA Batch Loader Inference

From Leeroopedia


Knowledge Sources
Domains Inference, VQA, Data_Loading
Last Updated 2026-02-07 14:00 GMT

Overview

This script performs batch-mode VQA inference using a PyTorch DataLoader with a custom Dataset class to efficiently process large visual question answering datasets.

Description

The model_vqa_loader.py script extends the basic VQA inference pipeline with a DataLoader-based architecture for improved throughput on large evaluation datasets. Key components include:

CustomDataset class: A PyTorch Dataset that encapsulates the full preprocessing pipeline in __getitem__:

  • Reads the question text and prepends appropriate image tokens
  • Constructs the conversation prompt using the specified template
  • Loads and preprocesses images via process_images (with aspect ratio handling)
  • Tokenizes the prompt with tokenizer_image_token
  • Returns (input_ids, image_tensor) tuples

create_data_loader function: Creates a DataLoader with configurable workers (default: 4) for parallel data loading. Asserts batch_size == 1 since autoregressive generation does not support batched inference.

eval_model function: Loads the model, creates the DataLoader, and iterates through batches. It auto-detects plain (pre-training) models and switches to mmtag conversation mode. Output decoding follows the standard pattern: strip stop string, write JSONL with question_id, prompt, text, answer_id, and model_id.

Usage

Use this script for large-scale VQA benchmark evaluation (VQAv2, GQA, VizWiz) where the parallel data loading workers significantly reduce I/O bottlenecks compared to the sequential model_vqa.py script.

Code Reference

Source Location

Signature

class CustomDataset(Dataset):
    def __init__(self, questions, image_folder, tokenizer, image_processor, model_config): ...
    def __getitem__(self, index) -> tuple: ...
    def __len__(self) -> int: ...

def create_data_loader(questions, image_folder, tokenizer, image_processor, model_config,
                       batch_size=1, num_workers=4) -> DataLoader: ...

def eval_model(args: argparse.Namespace) -> None: ...

Import

from llava.eval.model_vqa_loader import CustomDataset, create_data_loader, eval_model

I/O Contract

Inputs

Name Type Required Description
--model-path str Yes Path to the pretrained LLaVA model
--model-base str No Base model path for LoRA or projector-only models
--image-folder str No Root directory for image files
--question-file str No Path to JSONL question file (default: tables/question.jsonl)
--answers-file str No Path for output JSONL answers file (default: answer.jsonl)
--conv-mode str No Conversation template name (default: llava_v1)
--num-chunks int No Number of chunks for multi-GPU splitting (default: 1)
--chunk-idx int No Index of the chunk to process (default: 0)
--temperature float No Sampling temperature (default: 0.2)
--top_p float No Top-p sampling parameter (default: None)
--num_beams int No Number of beams for beam search (default: 1)

Outputs

Name Type Description
answers file JSONL Each line contains question_id, prompt, text, answer_id, model_id, and metadata

Usage Examples

Basic Usage

# Command-line execution for batch VQA inference
# python internvl_chat_llava/llava/eval/model_vqa_loader.py \
#     --model-path /path/to/llava-model \
#     --image-folder /path/to/images \
#     --question-file questions.jsonl \
#     --answers-file answers.jsonl \
#     --conv-mode llava_v1 \
#     --temperature 0

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment