Principle:Open compass VLMEvalKit Video Inference Orchestration
| Field | Value |
|---|---|
| source | Repo |
| domain | Vision, Video_Understanding, Distributed_Computing |
Overview
An orchestration pattern that manages VLM inference on video benchmarks with configurable frame sampling and distributed processing.
Description
Video inference extends the image inference pattern with video-specific handling: frame extraction from video files at configurable frame rates (fps) or fixed frame counts (nframe), support for pack mode (batching multiple questions per video in MMBench-Video), and video-specific prompt construction. The infer_data_job_video() function follows the same distributed rank-splitting pattern as image inference but includes video dataset-specific pre/post-processing.
Usage
Use for video benchmark evaluation (MVBench, Video-MME, MMBench-Video, MLVU, etc.). Requires a video-capable model (VIDEO_LLM=True) and a video dataset configured with frame sampling parameters.
Theoretical Basis
Same data-parallel pattern as image inference, extended with temporal sampling. Video frames are extracted at the configured rate and passed as image sequences to the VLM.