Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:EvolvingLMMs Lab Lmms eval Data Sharding

From Leeroopedia
Knowledge Sources
Domains Distributed_Computing, Data_Processing
Last Updated 2026-02-14 00:00 GMT

Overview

Partitioning evaluation data across distributed processes ensures that each process evaluates a disjoint subset of the dataset, maximizing throughput while guaranteeing full dataset coverage.

Description

In distributed evaluation, the complete dataset must be divided among all participating processes so that:

  1. Every document is evaluated exactly once -- No document is missed and no document is processed by multiple ranks.
  2. The load is approximately balanced -- Each rank receives roughly the same number of documents to minimize idle time.
  3. The assignment is deterministic -- Given the same rank, world_size, and dataset ordering, the same partition is produced every time.

Interleaved round-robin sharding is a simple and effective strategy that satisfies all three requirements. Rather than assigning contiguous blocks of documents to each rank (which can cause load imbalance if document difficulty varies systematically), round-robin distributes documents cyclically:

  • Rank 0 gets documents at indices 0, world_size, 2*world_size, ...
  • Rank 1 gets documents at indices 1, world_size+1, 2*world_size+1, ...
  • Rank k gets documents at indices k, world_size+k, 2*world_size+k, ...

This interleaving ensures that any systematic variation in document characteristics (such as image resolution or text length varying with position) is spread evenly across ranks.

Usage

Use data sharding whenever running evaluation across multiple GPUs or nodes. The sharding is applied during the request-building phase -- before any model inference occurs -- so each rank builds evaluation instances only for its assigned documents. This approach also supports:

  • Limiting evaluation -- A limit parameter controls how many total documents to evaluate (useful for debugging or quick checks).
  • Offsetting -- An offset parameter skips the first N documents before sharding begins, enabling evaluation of specific dataset segments.

Theoretical Basis

Given a dataset of D documents indexed 0 through D-1, with W processes (world_size) and rank r:

Shard(r, W, D) = { i : i in [0, D), i mod W == r }

Equivalently, using Python's itertools.islice:
  start = r
  stop  = D  (or limit if specified)
  step  = W
  Shard = islice(range(D), start, stop, step)

The number of documents assigned to rank r is:

|Shard(r, W, D)| = floor(D / W) + (1 if r < D mod W else 0)

This means the maximum imbalance between any two ranks is exactly 1 document. When D mod W != 0, the lower-numbered ranks each receive one extra document.

When an offset O is applied, the sharding formula becomes:

start = r + O
stop  = O + limit  (or None if no limit)
step  = W

The offset shifts the starting position uniformly for all ranks, preserving the round-robin interleaving pattern.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment