Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Open compass VLMEvalKit Video Dataset Configuration

From Leeroopedia
Field Value
source VLMEvalKit|https://github.com/open-compass/VLMEvalKit
domain Vision, Video_Understanding, Evaluation
last_updated 2026-02-14 00:00 GMT

Overview

A configuration registry that maps video benchmark names with frame sampling parameters to pre-configured dataset constructors.

Description

Video benchmarks in VLMEvalKit require frame sampling configuration: how many frames to extract (nframe) or at what rate (fps). The supported_video_datasets dictionary maps configuration-specific names (e.g., "MVBench_8frame", "Video-MME_1fps") to functools.partial objects that pre-configure the dataset class with frame/fps parameters and optionally pack mode. This allows the same underlying benchmark class to be used with different temporal sampling strategies, which significantly affects evaluation results.

Usage

When evaluating video benchmarks, select the appropriate configuration name that matches your model's capability (e.g., use more frames for models that support longer context). Available configurations are listed per benchmark in vlmeval/dataset/video_dataset_config.py.

Theoretical Basis

Temporal sampling trade-off — more frames provide richer temporal information but increase compute and memory. fps-based sampling adapts to video length while nframe-based gives fixed context. The configuration registry separates sampling policy from benchmark logic.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment