Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen Get Batches

From Leeroopedia


Knowledge Sources
Domains Benchmark_Integration, Batch_Processing
Last Updated 2026-02-09 00:00 GMT

Overview

Concrete tool for constructing padded evaluation batches from HELM scenario requests provided by the FlexLLMGen HELM integration.

Description

get_batches() takes a ScenarioState (containing request_states with prompts and generation parameters), a tokenizer, a batch_size, and optional pad_to_seq_len. It groups requests into batches of batch_size, tokenizes all prompts with left-padding, and returns a list of batch dictionaries containing input_ids, generation parameters (temperature, max_tokens, stop_sequences), and the original request_state references for result mapping.

Usage

Called within the HELM execute() function before the generation loop. The returned batches are iterated and fed to OptLM.generate().

Code Reference

  • Source: flexllmgen/apps/helm_run.py, Lines: 131-163
  • Signature:
def get_batches(scenario_state, tokenizer, batch_size, pad_to_seq_len=None):
    """Group HELM requests into batches for FlexLLMGen inference.

    Args:
        scenario_state: HELM ScenarioState with request_states
        tokenizer: OptTokenizer or compatible tokenizer
        batch_size: Number of requests per batch (gpu_batch_size * num_gpu_batches)
        pad_to_seq_len: Optional fixed padding length (auto-computed if None)
    Returns:
        List of batch dicts with input_ids, generation params, request_states
    """
  • Import:
from flexllmgen.apps.helm_run import get_batches

I/O Contract

Inputs

Parameter Type Required Description
scenario_state ScenarioState Yes HELM scenario with request_states
tokenizer OptTokenizer Yes Tokenizer for encoding prompts
batch_size int Yes Requests per batch
pad_to_seq_len int No Padding length auto-computed if None

Outputs

  • List[dict] — list of batch dictionaries each containing input_ids numpy array, generation parameters, and request_state references

Usage Examples

from flexllmgen.apps.helm_run import get_batches, OptTokenizer

tokenizer = OptTokenizer("facebook/opt-30b")
batch_size = policy.gpu_batch_size * policy.num_gpu_batches

batches = get_batches(scenario_state, tokenizer, batch_size, pad_to_seq_len=512)

for batch in batches:
    output_ids = model.generate(
        batch["input_ids"],
        do_sample=batch["do_sample"],
        temperature=batch["temperature"],
        max_new_tokens=batch["max_new_tokens"],
        stop=batch.get("stop")
    )

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment