Principle:Open compass VLMEvalKit Dataset Base Class Hierarchy
| Field | Value |
|---|---|
| source | VLMEvalKit|https://github.com/open-compass/VLMEvalKit |
| domain | Vision, Evaluation, Data_Processing |
| last_updated | 2026-02-14 00:00 GMT |
Overview
A class hierarchy that provides base implementations for different benchmark modalities (image MCQ, VQA, video) with auto-downloading, prompt building, and evaluation capabilities.
Description
VLMEvalKit organizes benchmarks into a class hierarchy: ImageBaseDataset is the root for image benchmarks, with specialized subclasses ImageMCQDataset (TYPE='MCQ'), ImageVQADataset (TYPE='VQA'), and ImageYORNDataset (TYPE='Y/N'). VideoBaseDataset extends the concept to video with frame extraction. TextBaseDataset handles text-only benchmarks.
Each base class provides:
- Auto-download via
DATASET_URL/DATASET_MD5class attributes andprepare_tsv() - Default
build_prompt()for prompt construction - Abstract
evaluate()for scoring dump_image()for base64 image decoding
New benchmarks subclass the appropriate base and override DATASET_URL, DATASET_MD5, and optionally build_prompt() and evaluate().
Usage
When adding a new benchmark, choose the appropriate base class based on task type: ImageMCQDataset for multiple-choice, ImageVQADataset for open-ended VQA, VideoBaseDataset for video benchmarks. Subclass it and set DATASET_URL and DATASET_MD5 class attributes.
Theoretical Basis
Template Method pattern — base classes define the skeleton (download → load → build prompt → evaluate) and subclasses fill in the specifics. The class hierarchy enforces consistent data handling across 100+ benchmarks.