Implementation:FMInference FlexLLMGen HELM Scenario Pipeline

Knowledge Sources	FlexLLMGen HELM Benchmark
Domains	Benchmark_Integration, Evaluation
Last Updated	2026-02-09 00:00 GMT

Overview

Wrapper documentation for HELM's scenario instantiation pipeline as used by FlexLLMGen's HELM integration.

Description

This is a Wrapper Doc for the external HELM library APIs (crfm-helm==0.2.1). FlexLLMGen's helm_run.py uses run_entries_to_run_specs() to parse description strings, create_scenario() to load benchmark datasets, and AdapterFactory.get_adapter() to create prompt formatters. These are external HELM APIs wrapped in FlexLLMGen's run_entry() function.

Usage

Used internally by flexllmgen.apps.helm_run. Users invoke via CLI:

# python -m flexllmgen.apps.helm_run --description "mmlu:subject=philosophy,model=text" --model facebook/opt-30b

External Reference

HELM Documentation

Code Reference

Source: flexllmgen/apps/helm_run.py, Lines: 292-381 (run_entry function wrapping HELM APIs)
Key HELM APIs used:

# From helm.benchmark.run
run_entries_to_run_specs(
    run_entries: List[RunEntry],
    max_eval_instances: int,
    num_train_trials: int
) -> List[RunSpec]

# From helm.benchmark.runner
create_scenario(scenario_spec: ScenarioSpec) -> Scenario
AdapterFactory.get_adapter(adapter_spec: AdapterSpec, tokenizer_service) -> Adapter
create_metric(metric_spec: MetricSpec) -> Metric

Import:

from helm.benchmark.run import run_entries_to_run_specs
from helm.benchmark.runner import create_scenario, AdapterFactory, create_metric

I/O Contract

Inputs

Parameter	Type	Required	Description
description	str	Yes	HELM description string e.g. "mmlu:subject=philosophy,model=text"
model_name	str	Yes	OPT model name
max_eval_instances	int	No	Cap evaluation instances

Outputs

RunSpec — with scenario_spec, adapter_spec, metric_specs
Scenario — with loaded instances
ScenarioState — with request_states ready for generation

Usage Examples

# CLI usage (primary interface)
# python -m flexllmgen.apps.helm_run \
#   --description "mmlu:subject=philosophy,model=text,data_augmentation=canonical" \
#   --model facebook/opt-iml-30b \
#   --percent 0 100 0 100 100 0

# Internal API usage in run_entry():
from helm.benchmark.presentation.run_entry import RunEntry
from helm.benchmark.run import run_entries_to_run_specs

run_entries = [RunEntry(description=description, priority=1)]
run_specs = run_entries_to_run_specs(run_entries, max_eval_instances=100, num_train_trials=3)
run_spec = run_specs[0]

scenario = create_scenario(run_spec.scenario_spec)
adapter = AdapterFactory.get_adapter(run_spec.adapter_spec, tokenizer_service=opt_tokenizer)

Related Pages

Principle:FMInference_FlexLLMGen_HELM_Scenario_Configuration

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment