Principle:Open compass VLMEvalKit Video Dataset Configuration

Field	Value
source	VLMEvalKit\|https://github.com/open-compass/VLMEvalKit
domain	Vision, Video_Understanding, Evaluation
last_updated	2026-02-14 00:00 GMT

Overview

A configuration registry that maps video benchmark names with frame sampling parameters to pre-configured dataset constructors.

Description

Video benchmarks in VLMEvalKit require frame sampling configuration: how many frames to extract (nframe) or at what rate (fps). The supported_video_datasets dictionary maps configuration-specific names (e.g., "MVBench_8frame", "Video-MME_1fps") to functools.partial objects that pre-configure the dataset class with frame/fps parameters and optionally pack mode. This allows the same underlying benchmark class to be used with different temporal sampling strategies, which significantly affects evaluation results.

Usage

When evaluating video benchmarks, select the appropriate configuration name that matches your model's capability (e.g., use more frames for models that support longer context). Available configurations are listed per benchmark in vlmeval/dataset/video_dataset_config.py.

Theoretical Basis

Temporal sampling trade-off — more frames provide richer temporal information but increase compute and memory. fps-based sampling adapts to video length while nframe-based gives fixed context. The configuration registry separates sampling policy from benchmark logic.

Related Pages

Implementation:Open_compass_VLMEvalKit_Supported_Video_Datasets

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment