Implementation:EvolvingLMMs Lab Lmms eval Create Iterator

Knowledge Sources	lmms-eval
Domains	Distributed_Computing, Data_Processing
Last Updated	2026-02-14 00:00 GMT

Overview

Concrete tool for partitioning evaluation data across processes using interleaved round-robin sharding provided by the lmms-eval framework.

Description

The create_iterator function in lmms_eval/utils.py implements round-robin data sharding using Python's itertools.islice. It takes a raw iterator over documents (typically from enumerate(task.eval_docs_no_media)) and returns a sliced iterator that yields only the documents assigned to the given rank.

The function is called from Task.build_all_requests() during the request construction phase. Each rank calls build_all_requests() with its own rank and world_size, receiving a different slice of the document iterator. This means each rank only constructs evaluation instances for its assigned documents, saving both memory and compute.

The implementation handles edge cases including:

Null offset -- Treats None offset as 0
Negative offset -- Raises a ValueError for invalid negative offsets
No limit -- When limit is None, the stop parameter of islice is also None, consuming the entire iterator
Single GPU -- When world_size=1, the step is 1, returning all documents (no sharding)

Usage

This function is used internally by Task.build_all_requests() whenever evaluation is distributed. It is not typically called directly by end users but is invoked automatically when launching with multiple processes via accelerate launch or torchrun.

Code Reference

Source Location

Repository: lmms-eval
File: lmms_eval/utils.py
Lines: L857-870

Called from:

File: lmms_eval/api/task.py
Lines: L382-442

Signature

def create_iterator(
    raw_iterator,
    rank: int,
    world_size: int,
    limit: Optional[int] = None,
    offset: int = 0,
) -> itertools.islice:

Import

from lmms_eval.utils import create_iterator

I/O Contract

Inputs

Name	Type	Required	Description
raw_iterator	`Iterator`	Yes	The raw document iterator, typically `enumerate(task.eval_docs_no_media)` yielding `(doc_id, doc)` tuples
rank	`int`	Yes	The global rank of the current process (0 to world_size-1)
world_size	`int`	Yes	The total number of distributed processes
limit	`Optional[int]`	No (default: None)	Maximum total number of documents to evaluate across all ranks; `None` means no limit
offset	`int`	No (default: 0)	Number of documents to skip before sharding begins; must be >= 0

Outputs

Name	Type	Description
sliced_iterator	`itertools.islice`	An iterator yielding only the elements assigned to the specified rank via round-robin selection: elements at positions rank+offset, rank+offset+world_size, rank+offset+2*world_size, ...

Usage Examples

Basic Example

from lmms_eval.utils import create_iterator

# Suppose we have 10 documents and 4 GPUs
documents = list(range(10))  # [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

# Rank 0 gets: [0, 4, 8]
rank0_docs = list(create_iterator(iter(documents), rank=0, world_size=4))
# rank0_docs == [0, 4, 8]

# Rank 1 gets: [1, 5, 9]
rank1_docs = list(create_iterator(iter(documents), rank=1, world_size=4))
# rank1_docs == [1, 5, 9]

# Rank 2 gets: [2, 6]
rank2_docs = list(create_iterator(iter(documents), rank=2, world_size=4))
# rank2_docs == [2, 6]

# Rank 3 gets: [3, 7]
rank3_docs = list(create_iterator(iter(documents), rank=3, world_size=4))
# rank3_docs == [3, 7]

With Limit and Offset

from lmms_eval.utils import create_iterator

documents = list(range(20))

# Evaluate only the first 8 documents, starting from offset 4
# Effective range: documents[4:12] = [4, 5, 6, 7, 8, 9, 10, 11]
# Rank 0 (start=0+4=4, stop=4+8=12, step=2): [4, 6, 8, 10]
rank0 = list(create_iterator(iter(documents), rank=0, world_size=2, limit=8, offset=4))

# Rank 1 (start=1+4=5, stop=4+8=12, step=2): [5, 7, 9, 11]
rank1 = list(create_iterator(iter(documents), rank=1, world_size=2, limit=8, offset=4))

Internal Usage in Task

# From lmms_eval/api/task.py build_all_requests():
doc_id_docs = utils.create_iterator(
    enumerate(self.eval_docs_no_media),
    rank=rank,
    limit=int(limit) if limit else None,
    world_size=world_size,
    offset=offset,
)
# Each rank iterates only over its assigned (doc_id, doc) pairs
for doc_id, doc in doc_id_docs:
    # Build evaluation instances for this document
    ...

Related Pages

Implements Principle

Principle:EvolvingLMMs_Lab_Lmms_eval_Data_Sharding

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment