Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:FMInference FlexLLMGen HELM Scenario Pipeline

From Leeroopedia


Knowledge Sources
Domains Benchmark_Integration, Evaluation
Last Updated 2026-02-09 00:00 GMT

Overview

Wrapper documentation for HELM's scenario instantiation pipeline as used by FlexLLMGen's HELM integration.

Description

This is a Wrapper Doc for the external HELM library APIs (crfm-helm==0.2.1). FlexLLMGen's helm_run.py uses run_entries_to_run_specs() to parse description strings, create_scenario() to load benchmark datasets, and AdapterFactory.get_adapter() to create prompt formatters. These are external HELM APIs wrapped in FlexLLMGen's run_entry() function.

Usage

Used internally by flexllmgen.apps.helm_run. Users invoke via CLI:

# python -m flexllmgen.apps.helm_run --description "mmlu:subject=philosophy,model=text" --model facebook/opt-30b

External Reference

Code Reference

  • Source: flexllmgen/apps/helm_run.py, Lines: 292-381 (run_entry function wrapping HELM APIs)
  • Key HELM APIs used:
# From helm.benchmark.run
run_entries_to_run_specs(
    run_entries: List[RunEntry],
    max_eval_instances: int,
    num_train_trials: int
) -> List[RunSpec]

# From helm.benchmark.runner
create_scenario(scenario_spec: ScenarioSpec) -> Scenario
AdapterFactory.get_adapter(adapter_spec: AdapterSpec, tokenizer_service) -> Adapter
create_metric(metric_spec: MetricSpec) -> Metric
  • Import:
from helm.benchmark.run import run_entries_to_run_specs
from helm.benchmark.runner import create_scenario, AdapterFactory, create_metric

I/O Contract

Inputs

Parameter Type Required Description
description str Yes HELM description string e.g. "mmlu:subject=philosophy,model=text"
model_name str Yes OPT model name
max_eval_instances int No Cap evaluation instances

Outputs

  • RunSpec — with scenario_spec, adapter_spec, metric_specs
  • Scenario — with loaded instances
  • ScenarioState — with request_states ready for generation

Usage Examples

# CLI usage (primary interface)
# python -m flexllmgen.apps.helm_run \
#   --description "mmlu:subject=philosophy,model=text,data_augmentation=canonical" \
#   --model facebook/opt-iml-30b \
#   --percent 0 100 0 100 100 0

# Internal API usage in run_entry():
from helm.benchmark.presentation.run_entry import RunEntry
from helm.benchmark.run import run_entries_to_run_specs

run_entries = [RunEntry(description=description, priority=1)]
run_specs = run_entries_to_run_specs(run_entries, max_eval_instances=100, num_train_trials=3)
run_spec = run_specs[0]

scenario = create_scenario(run_spec.scenario_spec)
adapter = AdapterFactory.get_adapter(run_spec.adapter_spec, tokenizer_service=opt_tokenizer)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment