Implementation:Open compass VLMEvalKit Build Dataset
| Field | Value |
|---|---|
| Source | VLMEvalKit|https://github.com/open-compass/VLMEvalKit |
| Domain | Vision, Evaluation, Data_Processing |
| Last Updated | 2026-02-14 00:00 GMT |
Overview
Concrete tool for constructing benchmark dataset objects by name from the VLMEvalKit dataset registry.
Description
The build_dataset() function in vlmeval/dataset/__init__.py resolves a dataset name string to a dataset class instance. The resolution proceeds in the following order:
- Check video datasets — First checks if the name is in
supported_video_datasets. - Iterate through DATASET_CLASSES — Searches through the combined list of
IMAGE_DATASET + VIDEO_DATASET + TEXT_DATASET + CUSTOM_DATASETto find a class that supports the given dataset name. - Fallback for unregistered datasets — If no registered class matches, the function attempts to load a local TSV file and wraps it in either
CustomMCQDatasetorCustomVQADatasetbased on column presence (e.g., whether the TSV contains answer choice columns).
Each dataset class handles its own data downloading, MD5 verification, and initialization. The factory function orchestrates the lookup and delegates construction to the matched class.
Usage
Call with a dataset name string to get a fully initialized dataset object ready for inference and evaluation:
- The dataset object provides
.data(Pandas DataFrame),.dataset_name(str),.TYPE(str), and an.evaluate()method. - Used by the main evaluation pipeline in
run.pyto construct datasets from command-line arguments. - Can be used standalone for dataset inspection, custom evaluation scripts, or data analysis.
Code Reference
Source Location
| Field | Value |
|---|---|
| Repository | VLMEvalKit |
| File | vlmeval/dataset/__init__.py
|
| Lines | L308-335 |
Signature
def build_dataset(dataset_name: str, **kwargs) -> Optional[Dataset]:
"""
Args:
dataset_name: Name of the benchmark (e.g., "MMBench_DEV_EN_V11", "AI2D_TEST").
**kwargs: Additional arguments passed to dataset constructor.
Returns:
Dataset instance or None if construction fails.
"""
Import
from vlmeval.dataset import build_dataset
I/O Contract
| Direction | Type | Description |
|---|---|---|
| Input | dataset_name (str) |
Name of the benchmark dataset to construct (e.g., "MMBench_DEV_EN_V11", "AI2D_TEST", "MVBench_8frame")
|
| Input | **kwargs |
Additional keyword arguments forwarded to the dataset class constructor |
| Output | Dataset instance | Object with .data (DataFrame), .dataset_name (str), .TYPE (str), .evaluate() method
|
| Output | None |
Returned if the dataset cannot be constructed (unrecognized name, download failure, etc.) |
The returned dataset object exposes the following interface:
| Attribute/Method | Type | Description |
|---|---|---|
.data |
pandas.DataFrame |
The benchmark data with columns for questions, images, answers, etc. |
.dataset_name |
str |
The canonical name of the dataset |
.TYPE |
str |
Dataset type identifier (e.g., "MCQ", "VQA", "VIDEO")
|
.evaluate(result_file) |
method | Evaluates model predictions against ground truth and returns metrics |
Usage Examples
Building an Image MCQ Dataset
from vlmeval.dataset import build_dataset
# Build the MMBench development set (English, V1.1)
dataset = build_dataset("MMBench_DEV_EN_V11")
# Inspect the dataset
print(f"Dataset: {dataset.dataset_name}")
print(f"Type: {dataset.TYPE}")
print(f"Number of samples: {len(dataset.data)}")
print(f"Columns: {list(dataset.data.columns)}")
Building a Video Dataset
from vlmeval.dataset import build_dataset
# Build the MVBench video benchmark (8-frame variant)
dataset = build_dataset("MVBench_8frame")
# The dataset is ready for video model inference
print(f"Dataset: {dataset.dataset_name}")
print(f"Type: {dataset.TYPE}")
Using in an Evaluation Pipeline
from vlmeval.smp import load_env
from vlmeval.config import supported_VLM
from vlmeval.dataset import build_dataset
# Step 1: Load environment
load_env()
# Step 2: Build model and dataset
model = supported_VLM["InternVL2-8B"]()
dataset = build_dataset("AI2D_TEST")
# Step 3: Run inference (simplified)
for idx, row in dataset.data.iterrows():
response = model.generate([row["question"]], [row["image"]])
# ... collect responses ...
# Step 4: Evaluate results
metrics = dataset.evaluate("results/AI2D_TEST_InternVL2-8B.xlsx")
print(metrics)