Implementation:FMInference FlexLLMGen HELM Scenario Pipeline
Appearance
| Knowledge Sources | |
|---|---|
| Domains | Benchmark_Integration, Evaluation |
| Last Updated | 2026-02-09 00:00 GMT |
Overview
Wrapper documentation for HELM's scenario instantiation pipeline as used by FlexLLMGen's HELM integration.
Description
This is a Wrapper Doc for the external HELM library APIs (crfm-helm==0.2.1). FlexLLMGen's helm_run.py uses run_entries_to_run_specs() to parse description strings, create_scenario() to load benchmark datasets, and AdapterFactory.get_adapter() to create prompt formatters. These are external HELM APIs wrapped in FlexLLMGen's run_entry() function.
Usage
Used internally by flexllmgen.apps.helm_run. Users invoke via CLI:
# python -m flexllmgen.apps.helm_run --description "mmlu:subject=philosophy,model=text" --model facebook/opt-30b
External Reference
Code Reference
- Source: flexllmgen/apps/helm_run.py, Lines: 292-381 (run_entry function wrapping HELM APIs)
- Key HELM APIs used:
# From helm.benchmark.run
run_entries_to_run_specs(
run_entries: List[RunEntry],
max_eval_instances: int,
num_train_trials: int
) -> List[RunSpec]
# From helm.benchmark.runner
create_scenario(scenario_spec: ScenarioSpec) -> Scenario
AdapterFactory.get_adapter(adapter_spec: AdapterSpec, tokenizer_service) -> Adapter
create_metric(metric_spec: MetricSpec) -> Metric
- Import:
from helm.benchmark.run import run_entries_to_run_specs
from helm.benchmark.runner import create_scenario, AdapterFactory, create_metric
I/O Contract
Inputs
| Parameter | Type | Required | Description |
|---|---|---|---|
| description | str | Yes | HELM description string e.g. "mmlu:subject=philosophy,model=text" |
| model_name | str | Yes | OPT model name |
| max_eval_instances | int | No | Cap evaluation instances |
Outputs
- RunSpec — with scenario_spec, adapter_spec, metric_specs
- Scenario — with loaded instances
- ScenarioState — with request_states ready for generation
Usage Examples
# CLI usage (primary interface)
# python -m flexllmgen.apps.helm_run \
# --description "mmlu:subject=philosophy,model=text,data_augmentation=canonical" \
# --model facebook/opt-iml-30b \
# --percent 0 100 0 100 100 0
# Internal API usage in run_entry():
from helm.benchmark.presentation.run_entry import RunEntry
from helm.benchmark.run import run_entries_to_run_specs
run_entries = [RunEntry(description=description, priority=1)]
run_specs = run_entries_to_run_specs(run_entries, max_eval_instances=100, num_train_trials=3)
run_spec = run_specs[0]
scenario = create_scenario(run_spec.scenario_spec)
adapter = AdapterFactory.get_adapter(run_spec.adapter_spec, tokenizer_service=opt_tokenizer)
Related Pages
Page Connections
Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment