Principle:Open compass VLMEvalKit Video Inference Orchestration

Field	Value
source	Repo
domain	Vision, Video_Understanding, Distributed_Computing

Overview

An orchestration pattern that manages VLM inference on video benchmarks with configurable frame sampling and distributed processing.

Description

Video inference extends the image inference pattern with video-specific handling: frame extraction from video files at configurable frame rates (fps) or fixed frame counts (nframe), support for pack mode (batching multiple questions per video in MMBench-Video), and video-specific prompt construction. The infer_data_job_video() function follows the same distributed rank-splitting pattern as image inference but includes video dataset-specific pre/post-processing.

Usage

Use for video benchmark evaluation (MVBench, Video-MME, MMBench-Video, MLVU, etc.). Requires a video-capable model (VIDEO_LLM=True) and a video dataset configured with frame sampling parameters.

Theoretical Basis

Same data-parallel pattern as image inference, extended with temporal sampling. Video frames are extracted at the configured rate and passed as image sequences to the VLM.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment