Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Open compass VLMEvalKit Benchmark Dataset Construction

From Leeroopedia
Field Value
Source https://github.com/open-compass/VLMEvalKit
Domain Vision, Evaluation, Data_Processing
Last Updated 2026-02-14 00:00 GMT

Overview

A factory pattern that resolves benchmark dataset names to fully initialized dataset objects with auto-downloaded data and configured evaluation methods.

Description

VLMEvalKit maintains a registry of 100+ benchmark datasets across multiple modalities:

  • Image MCQ — Multiple-choice question benchmarks (MMBench, SEEDBench, ScienceQA, AI2D, etc.)
  • Image VQA — Visual question answering benchmarks (TextVQA, ChartQA, DocVQA, OCRBench, etc.)
  • Video — Video understanding benchmarks (MVBench, Video-MME, EgoSchema, etc.)
  • Text — Text-only reasoning benchmarks used as baselines

The build_dataset() factory function takes a dataset name string, looks it up across registered dataset classes, and returns a fully initialized dataset object. The construction process involves:

  1. Name resolution — The factory searches through registered dataset classes (IMAGE_DATASET, VIDEO_DATASET, TEXT_DATASET, CUSTOM_DATASET) to find which class supports the given dataset name.
  2. Automatic data downloading — Each dataset class declares DATASET_URL and DATASET_MD5 class attributes. On first use, the data is automatically downloaded and cached locally, with MD5 verification ensuring data integrity.
  3. Initialization — The dataset class constructor loads the data (typically a TSV file) into a Pandas DataFrame, configures evaluation-specific settings, and prepares the dataset for inference.
  4. Fallback logic — For unregistered or custom datasets, the factory supports loading local TSV files and wrapping them in generic CustomMCQDataset or CustomVQADataset classes based on column presence.

Usage

Use when selecting a benchmark for evaluation:

  • The dataset name string (e.g., "MMBench_DEV_EN_V11", "AI2D_TEST") is passed to build_dataset().
  • The factory handles downloading, caching, integrity verification, and initialization.
  • The returned dataset object provides a uniform interface for iteration, inference, and evaluation.

This pattern ensures that users do not need to manually download data, configure paths, or know which class implements a specific benchmark — the factory handles all of this from a single name string.

Theoretical Basis

The Abstract Factory pattern is a creational design pattern that provides an interface for creating families of related objects without specifying their concrete classes. In VLMEvalKit, this manifests as:

  • Self-registration — Dataset classes register themselves via class attributes (DATASET_URL, DATASET_MD5, and supported dataset names). The factory does not need to be updated when new datasets are added.
  • Abstract interface — All dataset classes share a common interface: .data (DataFrame), .dataset_name (str), .TYPE (str), and .evaluate() method.
  • Lazy downloading — Data is only downloaded on first access, reducing startup time for users who only need a subset of benchmarks.

The pseudocode for this pattern is:

1. Define dataset classes with class-level attributes:
   class MMBenchDataset:
       DATASET_URL = {"MMBench_DEV_EN_V11": "https://..."}
       DATASET_MD5 = {"MMBench_DEV_EN_V11": "abc123..."}

2. Register all dataset classes:
   DATASET_CLASSES = IMAGE_DATASET + VIDEO_DATASET + TEXT_DATASET + CUSTOM_DATASET

3. build_dataset(dataset_name):
   a. Check supported_video_datasets first
   b. For each cls in DATASET_CLASSES:
      - If dataset_name in cls.DATASET_URL or cls supports dataset_name:
        - Download data if not cached (verify MD5)
        - Return cls(dataset_name)
   c. Fallback: Try loading local TSV, wrap in CustomMCQ/VQADataset
   d. Return None if all resolution fails

This design provides:

  • Extensibility — New benchmarks are added by creating a new dataset class with the appropriate class attributes.
  • Consistency — All datasets are accessed through the same factory interface.
  • Reliability — MD5 checksums ensure data integrity, and caching avoids redundant downloads.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment