Principle:Open compass VLMEvalKit Benchmark Dataset Construction
| Field | Value |
|---|---|
| Source | https://github.com/open-compass/VLMEvalKit |
| Domain | Vision, Evaluation, Data_Processing |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
A factory pattern that resolves benchmark dataset names to fully initialized dataset objects with auto-downloaded data and configured evaluation methods.
Description
VLMEvalKit maintains a registry of 100+ benchmark datasets across multiple modalities:
- Image MCQ — Multiple-choice question benchmarks (MMBench, SEEDBench, ScienceQA, AI2D, etc.)
- Image VQA — Visual question answering benchmarks (TextVQA, ChartQA, DocVQA, OCRBench, etc.)
- Video — Video understanding benchmarks (MVBench, Video-MME, EgoSchema, etc.)
- Text — Text-only reasoning benchmarks used as baselines
The build_dataset() factory function takes a dataset name string, looks it up across registered dataset classes, and returns a fully initialized dataset object. The construction process involves:
- Name resolution — The factory searches through registered dataset classes (
IMAGE_DATASET,VIDEO_DATASET,TEXT_DATASET,CUSTOM_DATASET) to find which class supports the given dataset name. - Automatic data downloading — Each dataset class declares
DATASET_URLandDATASET_MD5class attributes. On first use, the data is automatically downloaded and cached locally, with MD5 verification ensuring data integrity. - Initialization — The dataset class constructor loads the data (typically a TSV file) into a Pandas DataFrame, configures evaluation-specific settings, and prepares the dataset for inference.
- Fallback logic — For unregistered or custom datasets, the factory supports loading local TSV files and wrapping them in generic
CustomMCQDatasetorCustomVQADatasetclasses based on column presence.
Usage
Use when selecting a benchmark for evaluation:
- The dataset name string (e.g.,
"MMBench_DEV_EN_V11","AI2D_TEST") is passed tobuild_dataset(). - The factory handles downloading, caching, integrity verification, and initialization.
- The returned dataset object provides a uniform interface for iteration, inference, and evaluation.
This pattern ensures that users do not need to manually download data, configure paths, or know which class implements a specific benchmark — the factory handles all of this from a single name string.
Theoretical Basis
The Abstract Factory pattern is a creational design pattern that provides an interface for creating families of related objects without specifying their concrete classes. In VLMEvalKit, this manifests as:
- Self-registration — Dataset classes register themselves via class attributes (
DATASET_URL,DATASET_MD5, and supported dataset names). The factory does not need to be updated when new datasets are added. - Abstract interface — All dataset classes share a common interface:
.data(DataFrame),.dataset_name(str),.TYPE(str), and.evaluate()method. - Lazy downloading — Data is only downloaded on first access, reducing startup time for users who only need a subset of benchmarks.
The pseudocode for this pattern is:
1. Define dataset classes with class-level attributes:
class MMBenchDataset:
DATASET_URL = {"MMBench_DEV_EN_V11": "https://..."}
DATASET_MD5 = {"MMBench_DEV_EN_V11": "abc123..."}
2. Register all dataset classes:
DATASET_CLASSES = IMAGE_DATASET + VIDEO_DATASET + TEXT_DATASET + CUSTOM_DATASET
3. build_dataset(dataset_name):
a. Check supported_video_datasets first
b. For each cls in DATASET_CLASSES:
- If dataset_name in cls.DATASET_URL or cls supports dataset_name:
- Download data if not cached (verify MD5)
- Return cls(dataset_name)
c. Fallback: Try loading local TSV, wrap in CustomMCQ/VQADataset
d. Return None if all resolution fails
This design provides:
- Extensibility — New benchmarks are added by creating a new dataset class with the appropriate class attributes.
- Consistency — All datasets are accessed through the same factory interface.
- Reliability — MD5 checksums ensure data integrity, and caching avoids redundant downloads.