Principle:Open compass VLMEvalKit API Model Inference
| Field | Value |
|---|---|
| source | Repo |
| domain | Vision, Evaluation, API_Integration |
Overview
A parallel execution pattern for running inference against commercial VLM APIs with progress tracking and retry-based fault tolerance.
Description
API model inference in VLMEvalKit uses thread-based parallelism to make concurrent HTTP requests to commercial VLM endpoints (GPT-4o, Claude, Gemini, etc.). The infer_data_api() function builds prompts for all dataset samples, then dispatches them to track_progress_rich() which manages a ThreadPoolExecutor. Results are incrementally saved to a pickle file for fault tolerance. The system supports resume from partial results and can filter out failed API responses.
Usage
Use when evaluating API-based models. The parallel execution is controlled by api_nproc (default 4 threads). Higher values increase throughput but may hit rate limits.
Theoretical Basis
Thread-pool parallelism for I/O-bound tasks. Each API call is independent, making this embarrassingly parallel. Progress tracking with incremental saves provides fault tolerance.
Pseudocode:
- Build all prompts
- Filter already-completed samples
- Submit to thread pool
- Save results incrementally