Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:OpenGVLab InternVL Benchmark Dispatch

From Leeroopedia


Knowledge Sources
Domains Evaluation, Benchmarking, Vision_Language
Last Updated 2026-02-07 00:00 GMT

Overview

A unified evaluation dispatcher that routes model checkpoints to benchmark-specific evaluation scripts based on a dataset identifier.

Description

Comprehensive evaluation of vision-language models requires running many benchmarks, each with different data formats, evaluation protocols, and scoring metrics. The benchmark dispatch pattern provides a single entry point that:

  • Accepts a model checkpoint path and benchmark name
  • Routes to the appropriate evaluation script (VQA, multi-image, hallucination, etc.)
  • Configures distributed inference via torchrun
  • Passes through additional benchmark-specific arguments

This simplifies the evaluation workflow from memorizing per-benchmark commands to a single uniform interface.

Usage

Use this pattern when evaluating InternVL models across multiple benchmarks. Call the dispatcher script with a checkpoint path and benchmark name.

Theoretical Basis

The dispatch pattern is a simple conditional router:

# Pseudo-code: Benchmark dispatch
def dispatch(checkpoint, dataset, gpus=8):
    if dataset in VQA_BENCHMARKS:
        torchrun(f'eval/vqa/evaluate_vqa.py --checkpoint {checkpoint} --datasets {dataset}')
    elif dataset == 'mantis':
        torchrun(f'eval/mantis_eval/evaluate_mantis.py --checkpoint {checkpoint}')
    elif dataset == 'mmhal':
        torchrun(f'eval/mmhal/evaluate_mmhal.py --checkpoint {checkpoint}')
    elif dataset == 'mmvet':
        python(f'eval/mmvet/evaluate_mmvet.py --checkpoint {checkpoint}')
    # ... 40+ benchmark routes

Supported benchmark categories:

  • VQA: TextVQA, DocVQA, ChartQA, InfographicsVQA, AI2D, GQA, OKVQA, VizWiz, VQAv2, OCR-VQA
  • Multi-image: Mantis, MMIU, MIRB
  • Hallucination: MMHal
  • General: MM-Vet, POPE, ScienceQA, MathVista
  • MMMU: MMMU-val, MMMU-test, MMMU-CoT
  • Video: MVBench
  • Others: SEED, LLaVA-Bench, TinyLVLM, MMVP

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment