Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Open compass VLMEvalKit Image Inference Orchestration

From Leeroopedia
Field Value
source VLMEvalKit|https://github.com/open-compass/VLMEvalKit
domain Vision, Evaluation, Distributed_Computing
last_updated 2026-02-14 00:00 GMT

Overview

An orchestration pattern that manages the lifecycle of VLM inference across distributed GPU ranks with checkpoint-based fault tolerance.

Description

Image inference in VLMEvalKit follows a distributed data-parallel pattern. The infer_data_job() function splits data across GPU ranks, delegates per-rank inference to infer_data(), merges results on rank 0, and handles thinking/reasoning content separation (SPLIT_THINK mode). Each rank processes its shard independently, saves intermediate results to pickle files, and supports resume from checkpoints. The pipeline also routes API models to infer_data_api() transparently.

Usage

Use for any image-based benchmark evaluation. This is the primary inference entry point called by run.py for each model x dataset combination. Supports both local VLMs (via torchrun multi-GPU) and API models (via parallel HTTP requests).

Theoretical Basis

Data parallelism pattern — divide dataset across N workers, each processes independently, merge results. Checkpoint/resume for fault tolerance.

Pseudocode:

  1. Split data by rank
  2. Infer per shard
  3. Barrier sync
  4. Rank 0 merges
  5. Save final result file

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment