Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:EvolvingLMMs Lab Lmms eval Create Iterator

From Leeroopedia
Knowledge Sources
Domains Distributed_Computing, Data_Processing
Last Updated 2026-02-14 00:00 GMT

Overview

Concrete tool for partitioning evaluation data across processes using interleaved round-robin sharding provided by the lmms-eval framework.

Description

The create_iterator function in lmms_eval/utils.py implements round-robin data sharding using Python's itertools.islice. It takes a raw iterator over documents (typically from enumerate(task.eval_docs_no_media)) and returns a sliced iterator that yields only the documents assigned to the given rank.

The function is called from Task.build_all_requests() during the request construction phase. Each rank calls build_all_requests() with its own rank and world_size, receiving a different slice of the document iterator. This means each rank only constructs evaluation instances for its assigned documents, saving both memory and compute.

The implementation handles edge cases including:

  • Null offset -- Treats None offset as 0
  • Negative offset -- Raises a ValueError for invalid negative offsets
  • No limit -- When limit is None, the stop parameter of islice is also None, consuming the entire iterator
  • Single GPU -- When world_size=1, the step is 1, returning all documents (no sharding)

Usage

This function is used internally by Task.build_all_requests() whenever evaluation is distributed. It is not typically called directly by end users but is invoked automatically when launching with multiple processes via accelerate launch or torchrun.

Code Reference

Source Location

  • Repository: lmms-eval
  • File: lmms_eval/utils.py
  • Lines: L857-870

Called from:

  • File: lmms_eval/api/task.py
  • Lines: L382-442

Signature

def create_iterator(
    raw_iterator,
    rank: int,
    world_size: int,
    limit: Optional[int] = None,
    offset: int = 0,
) -> itertools.islice:

Import

from lmms_eval.utils import create_iterator

I/O Contract

Inputs

Name Type Required Description
raw_iterator Iterator Yes The raw document iterator, typically enumerate(task.eval_docs_no_media) yielding (doc_id, doc) tuples
rank int Yes The global rank of the current process (0 to world_size-1)
world_size int Yes The total number of distributed processes
limit Optional[int] No (default: None) Maximum total number of documents to evaluate across all ranks; None means no limit
offset int No (default: 0) Number of documents to skip before sharding begins; must be >= 0

Outputs

Name Type Description
sliced_iterator itertools.islice An iterator yielding only the elements assigned to the specified rank via round-robin selection: elements at positions rank+offset, rank+offset+world_size, rank+offset+2*world_size, ...

Usage Examples

Basic Example

from lmms_eval.utils import create_iterator

# Suppose we have 10 documents and 4 GPUs
documents = list(range(10))  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Rank 0 gets: [0, 4, 8]
rank0_docs = list(create_iterator(iter(documents), rank=0, world_size=4))
# rank0_docs == [0, 4, 8]

# Rank 1 gets: [1, 5, 9]
rank1_docs = list(create_iterator(iter(documents), rank=1, world_size=4))
# rank1_docs == [1, 5, 9]

# Rank 2 gets: [2, 6]
rank2_docs = list(create_iterator(iter(documents), rank=2, world_size=4))
# rank2_docs == [2, 6]

# Rank 3 gets: [3, 7]
rank3_docs = list(create_iterator(iter(documents), rank=3, world_size=4))
# rank3_docs == [3, 7]

With Limit and Offset

from lmms_eval.utils import create_iterator

documents = list(range(20))

# Evaluate only the first 8 documents, starting from offset 4
# Effective range: documents[4:12] = [4, 5, 6, 7, 8, 9, 10, 11]
# Rank 0 (start=0+4=4, stop=4+8=12, step=2): [4, 6, 8, 10]
rank0 = list(create_iterator(iter(documents), rank=0, world_size=2, limit=8, offset=4))

# Rank 1 (start=1+4=5, stop=4+8=12, step=2): [5, 7, 9, 11]
rank1 = list(create_iterator(iter(documents), rank=1, world_size=2, limit=8, offset=4))

Internal Usage in Task

# From lmms_eval/api/task.py build_all_requests():
doc_id_docs = utils.create_iterator(
    enumerate(self.eval_docs_no_media),
    rank=rank,
    limit=int(limit) if limit else None,
    world_size=world_size,
    offset=offset,
)
# Each rank iterates only over its assigned (doc_id, doc) pairs
for doc_id, doc in doc_id_docs:
    # Build evaluation instances for this document
    ...

Related Pages

Implements Principle

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment