Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Microsoft Autogen Agbench Run Cmd

From Leeroopedia
Knowledge Sources
Domains Benchmarking, Code_Execution, Docker, Testing
Last Updated 2026-02-11 17:00 GMT

Overview

A comprehensive benchmark execution module for running AutoGen scenarios in isolated Docker or native environments with support for parallel execution, environment management, and results tracking.

Description

⚠️ Deprecation Notice: JSON environment files (ENV.json) and implicit OPENAI_API_KEY inclusion are deprecated. Migrate to YAML format (ENV.yaml) with explicit environment variable definitions. See Heuristic:Microsoft_Autogen_Warning_Deprecated_JSON_Env_Files.

The `run_cmd.py` module is the core execution engine for agbench, AutoGen's benchmarking framework. It orchestrates the execution of benchmark scenarios by:

  • Expanding scenario templates with substitutions
  • Managing Docker containers or native execution environments
  • Handling environment variables and configuration files
  • Running scenarios with timeout controls
  • Supporting parallel execution across multiple processes
  • Managing Azure authentication tokens for cloud-based models
  • Tracking execution results in organized directory structures

The module supports both Docker-based isolation (recommended for security) and native execution. It includes comprehensive error handling, logging, and retry mechanisms for robust benchmark execution.

Usage

This module is typically invoked via the agbench CLI tool to run benchmark scenarios. It processes JSONL files containing scenario definitions and executes them with specified repetitions and configurations.

Code Reference

Source Location

  • Repository: Microsoft_Autogen
  • File: python/packages/agbench/src/agbench/run_cmd.py
  • Lines: 1-1011

Signature

def run_scenarios(
    scenario: str,
    n_repeats: int,
    is_native: bool,
    config_file: Union[None, str],
    token_provider: Optional[Callable[[], str]],
    docker_image: Optional[str] = None,
    results_dir: str = "Results",
    subsample: Union[None, int, float] = None,
    env_file: Union[None, str] = None,
) -> None

def expand_scenario(
    scenario_dir: str,
    scenario: ScenarioInstance,
    output_dir: str,
    config_file: Union[str, None]
) -> None

def run_scenario_in_docker(
    work_dir: str,
    env: Dict[str, str],
    timeout: int = TASK_TIMEOUT,
    docker_image: Optional[str] = None
) -> None

def run_scenario_natively(
    work_dir: str,
    env: Dict[str, str],
    timeout: int = TASK_TIMEOUT
) -> None

class ScenarioInstance(TypedDict):
    id: str
    template: Union[str, List[Union[str, List[str]]]]
    substitutions: Dict[str, Dict[str, str]]
    values: Dict[str, Dict[str, str]]

Import

from agbench.run_cmd import run_scenarios, expand_scenario, run_cli

I/O Contract

Inputs

Name Type Required Description
scenario str Yes Path to JSONL file, directory of JSONL files, or "-" for stdin
n_repeats int Yes Number of times to repeat each scenario instance
is_native bool Yes Whether to run natively (True) or in Docker (False)
config_file str No Path to configuration YAML file (default: config.yaml)
token_provider Callable No Function that returns Azure authentication tokens
docker_image str No Docker image to use (default: builds "agbench" image)
results_dir str No Directory for storing results (default: "Results")
subsample int/float No Proportion (0.0-1.0) or count of scenarios to run
env_file str No Path to environment variables file (default: ENV.yaml)

Outputs

Name Type Description
Results directory Directory Organized hierarchy: Results/{scenario}/{instance_id}/{repetition}/
console_log.txt File Complete console output from scenario execution
timestamp.txt File Execution metadata including agbench version
scenario.py File Expanded scenario script ready for execution
Exit status None Function returns None; errors raise exceptions

Core Functions

run_scenarios

Main entry point that orchestrates scenario execution. Loads JSONL files, applies subsampling if specified, creates result directories, expands scenarios, and executes them either natively or in Docker.

expand_scenario

Expands scenario templates by:

  • Copying template files/directories to the output location
  • Performing string substitutions in templated files
  • Including global template resources
  • Copying configuration files

Supports multiple template formats including single files, directories, and complex mappings.

run_scenario_in_docker

Executes a scenario in a Docker container with:

  • Automatic image building if needed
  • Volume mounting for workspace and AutoGen repository
  • Docker socket mounting for Docker-in-Docker support
  • Streaming log capture with timeout handling
  • Environment variable injection
  • Graceful container cleanup

run_scenario_natively

Executes a scenario in a native Python virtual environment with:

  • Virtual environment isolation (.agbench_venv)
  • Requirement installation from requirements.txt
  • Timeout enforcement using shell timeout command
  • Environment variable propagation
  • Initialization and finalization script hooks

get_scenario_env

Constructs environment variable dictionary from:

  • System environment (OPENAI_API_KEY)
  • Azure authentication tokens
  • ENV.yaml or ENV.json files
  • Variable substitution using ${VAR_NAME} syntax

run_parallel

Splits JSONL scenarios and executes them across multiple worker processes using multiprocessing.Pool.

get_azure_token_provider

Attempts to create an Azure DefaultAzureCredential bearer token provider if ~/.azure directory exists and authentication succeeds.

Usage Examples

Basic Usage

from agbench.run_cmd import run_scenarios

# Run scenarios in Docker with 3 repetitions
run_scenarios(
    scenario="scenarios/test.jsonl",
    n_repeats=3,
    is_native=False,
    config_file="config.yaml",
    token_provider=None,
    docker_image=None,
    results_dir="Results"
)

CLI Usage

# Run all JSONL files in a directory with 5 repetitions
agbench run scenarios/ -r 5

# Run with custom Docker image and environment file
agbench run scenarios/test.jsonl -d custom-image:latest -e custom_env.yaml

# Run natively (use with caution)
agbench run scenarios/test.jsonl --native

# Run with parallel execution (4 workers)
agbench run scenarios/test.jsonl -p 4 -r 2

# Subsample 50% of scenarios
agbench run scenarios/test.jsonl -s 0.5

# Use Azure authentication
agbench run scenarios/test.jsonl -a

Scenario JSONL Format

{
  "id": "task_001",
  "template": "templates/basic_task",
  "substitutions": {
    "scenario.py": {
      "__TASK__": "Calculate fibonacci numbers",
      "__MODEL__": "gpt-4"
    }
  }
}

Environment File Format

# ENV.yaml
OPENAI_API_KEY: ${OPENAI_API_KEY}
AZURE_OPENAI_ENDPOINT: https://my-resource.openai.azure.com
MODEL_NAME: gpt-4

Constants

  • TASK_TIMEOUT: 7200 seconds (120 minutes)
  • DEFAULT_DOCKER_IMAGE_TAG: "agbench"
  • DEFAULT_ENV_FILE_YAML: "ENV.yaml"
  • DEFAULT_ENV_FILE_JSON: "ENV.json" (deprecated)
  • DEFAULT_CONFIG_YAML: "config.yaml"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment