Principle:Mlfoundations Open flamingo Distributed Result Aggregation

Overview

Communication pattern for gathering evaluation predictions from all distributed processes and aggregating them into a unified result set for metric computation.

Description

In distributed evaluation, each GPU processes a subset of the test data. Before computing metrics, all predictions must be gathered to a single process (rank 0). PyTorch's all_gather_object collects Python objects from all ranks. After gathering, duplicate predictions (from overlapping samples in the last batch) are removed, metrics are computed, and results are saved as a JSON file with per-benchmark scores including mean and standard deviation across trials.

Usage

After generating predictions on distributed evaluation workers; before computing final metrics.

Theoretical Basis

Distributed evaluation splits the test set across N GPUs, reducing wall-clock time by ~N. The all_gather_object collective gathers variable-size Python objects to all ranks, unlike all_gather which requires fixed-size tensors. De-duplication handles the case where the last batch is padded to equal size across ranks. Multiple trials with different random seeds provide statistical robustness, reported as mean +/- stddev.

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment