Implementation:Open compass VLMEvalKit Build Dataset

Field	Value
Source	VLMEvalKit\|https://github.com/open-compass/VLMEvalKit
Domain	Vision, Evaluation, Data_Processing
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for constructing benchmark dataset objects by name from the VLMEvalKit dataset registry.

Description

The build_dataset() function in vlmeval/dataset/__init__.py resolves a dataset name string to a dataset class instance. The resolution proceeds in the following order:

Check video datasets — First checks if the name is in supported_video_datasets.
Iterate through DATASET_CLASSES — Searches through the combined list of IMAGE_DATASET + VIDEO_DATASET + TEXT_DATASET + CUSTOM_DATASET to find a class that supports the given dataset name.
Fallback for unregistered datasets — If no registered class matches, the function attempts to load a local TSV file and wraps it in either CustomMCQDataset or CustomVQADataset based on column presence (e.g., whether the TSV contains answer choice columns).

Each dataset class handles its own data downloading, MD5 verification, and initialization. The factory function orchestrates the lookup and delegates construction to the matched class.

Usage

Call with a dataset name string to get a fully initialized dataset object ready for inference and evaluation:

The dataset object provides .data (Pandas DataFrame), .dataset_name (str), .TYPE (str), and an .evaluate() method.
Used by the main evaluation pipeline in run.py to construct datasets from command-line arguments.
Can be used standalone for dataset inspection, custom evaluation scripts, or data analysis.

Code Reference

Source Location

Field	Value
Repository	VLMEvalKit
File	`vlmeval/dataset/__init__.py`
Lines	L308-335

Signature

def build_dataset(dataset_name: str, **kwargs) -> Optional[Dataset]:
    """
    Args:
        dataset_name: Name of the benchmark (e.g., "MMBench_DEV_EN_V11", "AI2D_TEST").
        **kwargs: Additional arguments passed to dataset constructor.
    Returns:
        Dataset instance or None if construction fails.
    """

Import

from vlmeval.dataset import build_dataset

I/O Contract

Direction	Type	Description
Input	`dataset_name` (str)	Name of the benchmark dataset to construct (e.g., `"MMBench_DEV_EN_V11"`, `"AI2D_TEST"`, `"MVBench_8frame"`)
Input	`**kwargs`	Additional keyword arguments forwarded to the dataset class constructor
Output	Dataset instance	Object with `.data` (DataFrame), `.dataset_name` (str), `.TYPE` (str), `.evaluate()` method
Output	`None`	Returned if the dataset cannot be constructed (unrecognized name, download failure, etc.)

The returned dataset object exposes the following interface:

Attribute/Method	Type	Description
`.data`	`pandas.DataFrame`	The benchmark data with columns for questions, images, answers, etc.
`.dataset_name`	`str`	The canonical name of the dataset
`.TYPE`	`str`	Dataset type identifier (e.g., `"MCQ"`, `"VQA"`, `"VIDEO"`)
`.evaluate(result_file)`	method	Evaluates model predictions against ground truth and returns metrics

Usage Examples

Building an Image MCQ Dataset

from vlmeval.dataset import build_dataset

# Build the MMBench development set (English, V1.1)
dataset = build_dataset("MMBench_DEV_EN_V11")

# Inspect the dataset
print(f"Dataset: {dataset.dataset_name}")
print(f"Type: {dataset.TYPE}")
print(f"Number of samples: {len(dataset.data)}")
print(f"Columns: {list(dataset.data.columns)}")

Building a Video Dataset

from vlmeval.dataset import build_dataset

# Build the MVBench video benchmark (8-frame variant)
dataset = build_dataset("MVBench_8frame")

# The dataset is ready for video model inference
print(f"Dataset: {dataset.dataset_name}")
print(f"Type: {dataset.TYPE}")

Using in an Evaluation Pipeline

from vlmeval.smp import load_env
from vlmeval.config import supported_VLM
from vlmeval.dataset import build_dataset

# Step 1: Load environment
load_env()

# Step 2: Build model and dataset
model = supported_VLM["InternVL2-8B"]()
dataset = build_dataset("AI2D_TEST")

# Step 3: Run inference (simplified)
for idx, row in dataset.data.iterrows():
    response = model.generate([row["question"]], [row["image"]])
    # ... collect responses ...

# Step 4: Evaluate results
metrics = dataset.evaluate("results/AI2D_TEST_InternVL2-8B.xlsx")
print(metrics)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment