Principle:Open compass VLMEvalKit Image Inference Orchestration

Field	Value
source	VLMEvalKit\|https://github.com/open-compass/VLMEvalKit
domain	Vision, Evaluation, Distributed_Computing
last_updated	2026-02-14 00:00 GMT

Overview

An orchestration pattern that manages the lifecycle of VLM inference across distributed GPU ranks with checkpoint-based fault tolerance.

Description

Image inference in VLMEvalKit follows a distributed data-parallel pattern. The infer_data_job() function splits data across GPU ranks, delegates per-rank inference to infer_data(), merges results on rank 0, and handles thinking/reasoning content separation (SPLIT_THINK mode). Each rank processes its shard independently, saves intermediate results to pickle files, and supports resume from checkpoints. The pipeline also routes API models to infer_data_api() transparently.

Usage

Use for any image-based benchmark evaluation. This is the primary inference entry point called by run.py for each model x dataset combination. Supports both local VLMs (via torchrun multi-GPU) and API models (via parallel HTTP requests).

Theoretical Basis

Data parallelism pattern — divide dataset across N workers, each processes independently, merge results. Checkpoint/resume for fault tolerance.

Pseudocode:

Split data by rank
Infer per shard
Barrier sync
Rank 0 merges
Save final result file

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment