Principle:Mlfoundations Open flamingo Distributed Result Aggregation
Overview
Communication pattern for gathering evaluation predictions from all distributed processes and aggregating them into a unified result set for metric computation.
Description
In distributed evaluation, each GPU processes a subset of the test data. Before computing metrics, all predictions must be gathered to a single process (rank 0). PyTorch's all_gather_object collects Python objects from all ranks. After gathering, duplicate predictions (from overlapping samples in the last batch) are removed, metrics are computed, and results are saved as a JSON file with per-benchmark scores including mean and standard deviation across trials.
Usage
After generating predictions on distributed evaluation workers; before computing final metrics.
Theoretical Basis
Distributed evaluation splits the test set across N GPUs, reducing wall-clock time by ~N. The all_gather_object collective gathers variable-size Python objects to all ranks, unlike all_gather which requires fixed-size tensors. De-duplication handles the case where the last batch is padded to equal size across ranks. Multiple trials with different random seeds provide statistical robustness, reported as mean +/- stddev.
Related Pages
Implementation:Mlfoundations_Open_flamingo_All_gather_json_dump