Principle:OpenGVLab InternVL Distributed Evaluation
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Distributed_Computing, Vision_Language |
| Last Updated | 2026-02-07 00:00 GMT |
Overview
A distributed inference pattern that partitions evaluation samples across multiple GPU processes and gathers results for unified scoring.
Description
Evaluating vision-language models on large benchmarks (thousands of samples) can be slow on a single GPU. Distributed evaluation parallelizes this by:
- Partitioning samples across processes using an InferenceSampler that assigns non-overlapping index ranges to each rank
- Running inference independently on each rank's partition
- Gathering predictions via torch.distributed.all_gather_object to collect all results on rank 0
- Computing metrics on the complete prediction set (rank 0 only)
This pattern supports both multi-GPU evaluation (one model copy per GPU via torchrun) and single-GPU with model parallelism (--auto flag).
Usage
Use distributed evaluation when benchmark size makes single-GPU evaluation impractical. The evaluation scripts handle distribution automatically via torchrun.
Theoretical Basis
# Pseudo-code: Distributed evaluation
def distributed_evaluate(model, dataset, world_size, rank):
# 1. Partition samples
sampler = InferenceSampler(len(dataset), world_size, rank)
# 2. Infer on partition
predictions = []
for idx in sampler:
sample = dataset[idx]
response = model.chat(tokenizer, sample.pixel_values, sample.question)
predictions.append({'question_id': sample.id, 'answer': response})
# 3. Gather all predictions
all_predictions = [None] * world_size
torch.distributed.all_gather_object(all_predictions, predictions)
# 4. Score on rank 0
if rank == 0:
merged = flatten(all_predictions)
score = compute_metric(merged)
print(f'Score: {score}')
Related Pages
Implemented By
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment