Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Open compass VLMEvalKit Build Dataset

From Leeroopedia
Field Value
Source VLMEvalKit|https://github.com/open-compass/VLMEvalKit
Domain Vision, Evaluation, Data_Processing
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for constructing benchmark dataset objects by name from the VLMEvalKit dataset registry.

Description

The build_dataset() function in vlmeval/dataset/__init__.py resolves a dataset name string to a dataset class instance. The resolution proceeds in the following order:

  1. Check video datasets — First checks if the name is in supported_video_datasets.
  2. Iterate through DATASET_CLASSES — Searches through the combined list of IMAGE_DATASET + VIDEO_DATASET + TEXT_DATASET + CUSTOM_DATASET to find a class that supports the given dataset name.
  3. Fallback for unregistered datasets — If no registered class matches, the function attempts to load a local TSV file and wraps it in either CustomMCQDataset or CustomVQADataset based on column presence (e.g., whether the TSV contains answer choice columns).

Each dataset class handles its own data downloading, MD5 verification, and initialization. The factory function orchestrates the lookup and delegates construction to the matched class.

Usage

Call with a dataset name string to get a fully initialized dataset object ready for inference and evaluation:

  • The dataset object provides .data (Pandas DataFrame), .dataset_name (str), .TYPE (str), and an .evaluate() method.
  • Used by the main evaluation pipeline in run.py to construct datasets from command-line arguments.
  • Can be used standalone for dataset inspection, custom evaluation scripts, or data analysis.

Code Reference

Source Location

Field Value
Repository VLMEvalKit
File vlmeval/dataset/__init__.py
Lines L308-335

Signature

def build_dataset(dataset_name: str, **kwargs) -> Optional[Dataset]:
    """
    Args:
        dataset_name: Name of the benchmark (e.g., "MMBench_DEV_EN_V11", "AI2D_TEST").
        **kwargs: Additional arguments passed to dataset constructor.
    Returns:
        Dataset instance or None if construction fails.
    """

Import

from vlmeval.dataset import build_dataset

I/O Contract

Direction Type Description
Input dataset_name (str) Name of the benchmark dataset to construct (e.g., "MMBench_DEV_EN_V11", "AI2D_TEST", "MVBench_8frame")
Input **kwargs Additional keyword arguments forwarded to the dataset class constructor
Output Dataset instance Object with .data (DataFrame), .dataset_name (str), .TYPE (str), .evaluate() method
Output None Returned if the dataset cannot be constructed (unrecognized name, download failure, etc.)

The returned dataset object exposes the following interface:

Attribute/Method Type Description
.data pandas.DataFrame The benchmark data with columns for questions, images, answers, etc.
.dataset_name str The canonical name of the dataset
.TYPE str Dataset type identifier (e.g., "MCQ", "VQA", "VIDEO")
.evaluate(result_file) method Evaluates model predictions against ground truth and returns metrics

Usage Examples

Building an Image MCQ Dataset

from vlmeval.dataset import build_dataset

# Build the MMBench development set (English, V1.1)
dataset = build_dataset("MMBench_DEV_EN_V11")

# Inspect the dataset
print(f"Dataset: {dataset.dataset_name}")
print(f"Type: {dataset.TYPE}")
print(f"Number of samples: {len(dataset.data)}")
print(f"Columns: {list(dataset.data.columns)}")

Building a Video Dataset

from vlmeval.dataset import build_dataset

# Build the MVBench video benchmark (8-frame variant)
dataset = build_dataset("MVBench_8frame")

# The dataset is ready for video model inference
print(f"Dataset: {dataset.dataset_name}")
print(f"Type: {dataset.TYPE}")

Using in an Evaluation Pipeline

from vlmeval.smp import load_env
from vlmeval.config import supported_VLM
from vlmeval.dataset import build_dataset

# Step 1: Load environment
load_env()

# Step 2: Build model and dataset
model = supported_VLM["InternVL2-8B"]()
dataset = build_dataset("AI2D_TEST")

# Step 3: Run inference (simplified)
for idx, row in dataset.data.iterrows():
    response = model.generate([row["question"]], [row["image"]])
    # ... collect responses ...

# Step 4: Evaluate results
metrics = dataset.evaluate("results/AI2D_TEST_InternVL2-8B.xlsx")
print(metrics)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment