Principle:OpenGVLab InternVL Benchmark Dispatch
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Benchmarking, Vision_Language |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A unified evaluation dispatcher that routes model checkpoints to benchmark-specific evaluation scripts based on a dataset identifier.
Description
Comprehensive evaluation of vision-language models requires running many benchmarks, each with different data formats, evaluation protocols, and scoring metrics. The benchmark dispatch pattern provides a single entry point that:
- Accepts a model checkpoint path and benchmark name
- Routes to the appropriate evaluation script (VQA, multi-image, hallucination, etc.)
- Configures distributed inference via torchrun
- Passes through additional benchmark-specific arguments
This simplifies the evaluation workflow from memorizing per-benchmark commands to a single uniform interface.
Usage
Use this pattern when evaluating InternVL models across multiple benchmarks. Call the dispatcher script with a checkpoint path and benchmark name.
Theoretical Basis
The dispatch pattern is a simple conditional router:
# Pseudo-code: Benchmark dispatch
def dispatch(checkpoint, dataset, gpus=8):
if dataset in VQA_BENCHMARKS:
torchrun(f'eval/vqa/evaluate_vqa.py --checkpoint {checkpoint} --datasets {dataset}')
elif dataset == 'mantis':
torchrun(f'eval/mantis_eval/evaluate_mantis.py --checkpoint {checkpoint}')
elif dataset == 'mmhal':
torchrun(f'eval/mmhal/evaluate_mmhal.py --checkpoint {checkpoint}')
elif dataset == 'mmvet':
python(f'eval/mmvet/evaluate_mmvet.py --checkpoint {checkpoint}')
# ... 40+ benchmark routes
Supported benchmark categories:
- VQA: TextVQA, DocVQA, ChartQA, InfographicsVQA, AI2D, GQA, OKVQA, VizWiz, VQAv2, OCR-VQA
- Multi-image: Mantis, MMIU, MIRB
- Hallucination: MMHal
- General: MM-Vet, POPE, ScienceQA, MathVista
- MMMU: MMMU-val, MMMU-test, MMMU-CoT
- Video: MVBench
- Others: SEED, LLaVA-Bench, TinyLVLM, MMVP