Principle:Open compass VLMEvalKit Dataset Base Class Hierarchy

Field	Value
source	VLMEvalKit\|https://github.com/open-compass/VLMEvalKit
domain	Vision, Evaluation, Data_Processing
last_updated	2026-02-14 00:00 GMT

Overview

A class hierarchy that provides base implementations for different benchmark modalities (image MCQ, VQA, video) with auto-downloading, prompt building, and evaluation capabilities.

Description

VLMEvalKit organizes benchmarks into a class hierarchy: ImageBaseDataset is the root for image benchmarks, with specialized subclasses ImageMCQDataset (TYPE='MCQ'), ImageVQADataset (TYPE='VQA'), and ImageYORNDataset (TYPE='Y/N'). VideoBaseDataset extends the concept to video with frame extraction. TextBaseDataset handles text-only benchmarks.

Each base class provides:

Auto-download via DATASET_URL/DATASET_MD5 class attributes and prepare_tsv()
Default build_prompt() for prompt construction
Abstract evaluate() for scoring
dump_image() for base64 image decoding

New benchmarks subclass the appropriate base and override DATASET_URL, DATASET_MD5, and optionally build_prompt() and evaluate().

Usage

When adding a new benchmark, choose the appropriate base class based on task type: ImageMCQDataset for multiple-choice, ImageVQADataset for open-ended VQA, VideoBaseDataset for video benchmarks. Subclass it and set DATASET_URL and DATASET_MD5 class attributes.

Theoretical Basis

Template Method pattern — base classes define the skeleton (download → load → build prompt → evaluate) and subclasses fill in the specifics. The class hierarchy enforces consistent data handling across 100+ benchmarks.

Related Pages

Implementation:Open_compass_VLMEvalKit_ImageBaseDataset

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment