Principle:EvolvingLMMs Lab Lmms eval Request Construction

Knowledge Sources	lmms-eval
Domains	Evaluation, Data_Processing
Last Updated	2026-02-14 00:00 GMT

Overview

Request construction is the process of transforming evaluation dataset documents into structured inference request objects that a model can consume, including support for few-shot context, distributed sharding, and request caching.

Description

Before a model can be evaluated, each document in the test or validation split must be converted into one or more Instance objects. Each Instance bundles together the textual prompt, generation or loglikelihood parameters, a reference to the visual input loader, the document ID, and metadata about the task and split. This conversion is task-type-aware: generation tasks produce a single Instance per document, while multiple-choice tasks produce one Instance per answer choice.

The request construction process handles several concerns simultaneously:

Split selection -- Preferring the test split, falling back to validation.
Few-shot context -- Building a prompt that includes example demonstrations drawn from the training or fewshot split, with configurable shot count, delimiter, and chat template formatting.
Distributed sharding -- Dividing documents across ranks using rank, world_size, and offset parameters via utils.create_iterator(), which applies interleaved slicing.
Limiting -- Restricting the number of documents via limit for debugging or quick sanity checks.
Caching -- Saving constructed instances to disk so that subsequent runs with the same configuration can skip the construction step entirely.
Repetition -- Supporting repeats > 1 for techniques like majority voting where each document is inferred on multiple times.

Usage

Use request construction whenever:

You are preparing an evaluation run and need to convert raw dataset documents into model-consumable requests.
You need to debug prompt formatting by inspecting the constructed Instance objects.
You are running distributed evaluation and need to ensure each rank processes a disjoint shard.
You want to cache requests to speed up repeated runs with the same task configuration.

Theoretical Basis

The request construction algorithm proceeds as follows:

Input: Task T with dataset D, configuration parameters (limit, offset, rank, world_size).

Step 1 -- Split Resolution:

if T.has_test_docs():
    docs = T.test_docs()
    split = config.test_split
elif T.has_validation_docs():
    docs = T.validation_docs()
    split = config.validation_split

Step 2 -- Iterator Construction (Distributed Sharding):

For a dataset of N documents, rank r in a world of size W with offset o:

indices_for_rank_r = [o + r, o + r + W, o + r + 2W, ...] intersected with [0, N)
if limit is not None:
    indices_for_rank_r = indices_for_rank_r[:limit]

Step 3 -- Per-Document Instance Creation:

For each document at index doc_id:

Build few-shot context string from the fewshot split.
Call construct_requests(doc_id, ctx, metadata) which creates Instance objects based on OUTPUT_TYPE.
Append to the instances list.

Step 4 -- Flatten and Cache:

Flatten nested instance lists, slice to the original limit, and optionally save to cache.

The output type determines the Instance structure:

generate_until -- One Instance with (context, generation_kwargs, doc_to_visual, doc_id, task, split).
loglikelihood / multiple_choice -- One Instance per choice with (context, target_string, doc_to_visual, doc_id, task, split).
generate_until_multi_round -- One Instance with an additional doc_to_text callable for multi-round interactions.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment