Implementation:FMInference FlexLLMGen Get Batches

Knowledge Sources	FlexLLMGen
Domains	Benchmark_Integration, Batch_Processing
Last Updated	2026-02-09 00:00 GMT

Overview

Concrete tool for constructing padded evaluation batches from HELM scenario requests provided by the FlexLLMGen HELM integration.

Description

get_batches() takes a ScenarioState (containing request_states with prompts and generation parameters), a tokenizer, a batch_size, and optional pad_to_seq_len. It groups requests into batches of batch_size, tokenizes all prompts with left-padding, and returns a list of batch dictionaries containing input_ids, generation parameters (temperature, max_tokens, stop_sequences), and the original request_state references for result mapping.

Usage

Called within the HELM execute() function before the generation loop. The returned batches are iterated and fed to OptLM.generate().

Code Reference

Source: flexllmgen/apps/helm_run.py, Lines: 131-163
Signature:

def get_batches(scenario_state, tokenizer, batch_size, pad_to_seq_len=None):
    """Group HELM requests into batches for FlexLLMGen inference.

    Args:
        scenario_state: HELM ScenarioState with request_states
        tokenizer: OptTokenizer or compatible tokenizer
        batch_size: Number of requests per batch (gpu_batch_size * num_gpu_batches)
        pad_to_seq_len: Optional fixed padding length (auto-computed if None)
    Returns:
        List of batch dicts with input_ids, generation params, request_states
    """

Import:

from flexllmgen.apps.helm_run import get_batches

I/O Contract

Inputs

Parameter	Type	Required	Description
scenario_state	ScenarioState	Yes	HELM scenario with request_states
tokenizer	OptTokenizer	Yes	Tokenizer for encoding prompts
batch_size	int	Yes	Requests per batch
pad_to_seq_len	int	No	Padding length auto-computed if None

Outputs

List[dict] — list of batch dictionaries each containing input_ids numpy array, generation parameters, and request_state references

Usage Examples

from flexllmgen.apps.helm_run import get_batches, OptTokenizer

tokenizer = OptTokenizer("facebook/opt-30b")
batch_size = policy.gpu_batch_size * policy.num_gpu_batches

batches = get_batches(scenario_state, tokenizer, batch_size, pad_to_seq_len=512)

for batch in batches:
    output_ids = model.generate(
        batch["input_ids"],
        do_sample=batch["do_sample"],
        temperature=batch["temperature"],
        max_new_tokens=batch["max_new_tokens"],
        stop=batch.get("stop")
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment