Principle:Open compass VLMEvalKit Video Dataset Configuration
| Field | Value |
|---|---|
| source | VLMEvalKit|https://github.com/open-compass/VLMEvalKit |
| domain | Vision, Video_Understanding, Evaluation |
| last_updated | 2026-02-14 00:00 GMT |
Overview
A configuration registry that maps video benchmark names with frame sampling parameters to pre-configured dataset constructors.
Description
Video benchmarks in VLMEvalKit require frame sampling configuration: how many frames to extract (nframe) or at what rate (fps). The supported_video_datasets dictionary maps configuration-specific names (e.g., "MVBench_8frame", "Video-MME_1fps") to functools.partial objects that pre-configure the dataset class with frame/fps parameters and optionally pack mode. This allows the same underlying benchmark class to be used with different temporal sampling strategies, which significantly affects evaluation results.
Usage
When evaluating video benchmarks, select the appropriate configuration name that matches your model's capability (e.g., use more frames for models that support longer context). Available configurations are listed per benchmark in vlmeval/dataset/video_dataset_config.py.
Theoretical Basis
Temporal sampling trade-off — more frames provide richer temporal information but increase compute and memory. fps-based sampling adapts to video length while nframe-based gives fixed context. The configuration registry separates sampling policy from benchmark logic.