Principle:Haotian liu LLaVA Batch VQA Inference
Overview
Technique for running scalable visual question answering inference across large evaluation datasets using parallel GPU processing with a DataLoader-based approach.
Description
Batch VQA inference uses a DataLoader-based approach to process evaluation questions efficiently. Questions are split into chunks across multiple GPUs for parallel processing. Each GPU loads the full model and processes its assigned chunk independently, writing results to separate answer files that are later merged into a single output.
The core components of this approach are:
- CustomDataset class - Handles image loading from disk, CLIP vision encoder preprocessing, conversation template formatting (with image token injection), and input tokenization. Each item returns a tuple of
(input_ids, image_tensor, image_size). - create_data_loader - Wraps the dataset in a PyTorch DataLoader with
batch_size=1(required due to variable image sizes and input sequence lengths) andnum_workers=4for asynchronous data loading. - eval_model - The main inference loop that loads the pretrained model, creates the DataLoader, iterates through questions, generates answers via
model.generate(), and writes results as JSONL.
The multi-GPU parallelism is orchestrated at the shell script level, not within the Python code. Each GPU runs a separate Python process with different --chunk-idx values, and all processes run concurrently via bash background jobs.
Usage
Use this for all large-scale benchmark evaluations including VQAv2, GQA, TextVQA, POPE, MMBench, SEED-Bench, and others. The multi-GPU chunk splitting is the standard parallelism strategy used by all V1.5 evaluation scripts under scripts/v1_5/eval/.
Typical Multi-GPU Launch Pattern
#!/bin/bash
gpu_list="${CUDA_VISIBLE_DEVICES:-0}"
IFS=',' read -ra GPULIST <<< "$gpu_list"
CHUNKS=${#GPULIST[@]}
for IDX in $(seq 0 $((CHUNKS-1))); do
CUDA_VISIBLE_DEVICES=${GPULIST[$IDX]} python -m llava.eval.model_vqa_loader \
--model-path liuhaotian/llava-v1.5-13b \
--question-file ./playground/data/eval/vqav2/$SPLIT.jsonl \
--image-folder ./playground/data/eval/vqav2/test2015 \
--answers-file ./playground/data/eval/vqav2/answers/$SPLIT/$CKPT/${CHUNKS}_${IDX}.jsonl \
--num-chunks $CHUNKS \
--chunk-idx $IDX \
--temperature 0 \
--conv-mode vicuna_v1 &
done
wait
# Merge chunk answer files
output_file=./playground/data/eval/vqav2/answers/$SPLIT/$CKPT/merge.jsonl
> "$output_file"
for IDX in $(seq 0 $((CHUNKS-1))); do
cat ./playground/data/eval/vqav2/answers/$SPLIT/$CKPT/${CHUNKS}_${IDX}.jsonl >> "$output_file"
done
Theoretical Basis
Chunk Splitting Strategy
Chunk splitting divides N questions into K chunks (one per GPU). The split_list() function computes chunk_size = ceil(N / K) and creates roughly equal-sized partitions. Each chunk runs independently with no inter-process communication required.
Batch Size Constraint
The batch size is fixed at 1 (enforced by an assertion in create_data_loader). This is required because:
- Images have variable dimensions, making tensor stacking across samples impractical
- Input token sequences have variable lengths due to different question texts
- The
collate_fnusestorch.stackwhich requires uniform tensor shapes
Answer File Format
Each answer line is a JSON object containing:
| Field | Type | Description |
|---|---|---|
question_id |
int/str | Original question identifier for alignment with ground truth |
prompt |
str | The original question text |
text |
str | Model-generated answer |
answer_id |
str | Unique UUID for this answer (generated via shortuuid)
|
model_id |
str | Model name derived from checkpoint path |
metadata |
dict | Empty metadata dict (reserved for extensions) |
Greedy Decoding
LLaVA v1.5 evaluation uses greedy decoding (temperature=0) by default to ensure reproducibility. When temperature=0, do_sample is set to False, selecting the highest probability token at each step.
Knowledge Sources
- Repo - LLaVA - https://github.com/haotian-liu/LLaVA
Domains
- Evaluation
- Parallel_Computing
Related Pages
Metadata
| Property | Value |
|---|---|
| last_updated | 2026-02-13 14:00 GMT |
| page_type | Principle |
| workflow | Benchmark_Evaluation |