Implementation:Microsoft Autogen Agbench Run Cmd

Knowledge Sources	Microsoft_Autogen
Domains	Benchmarking, Code_Execution, Docker, Testing
Last Updated	2026-02-11 17:00 GMT

Overview

A comprehensive benchmark execution module for running AutoGen scenarios in isolated Docker or native environments with support for parallel execution, environment management, and results tracking.

Description

⚠️ Deprecation Notice: JSON environment files (ENV.json) and implicit OPENAI_API_KEY inclusion are deprecated. Migrate to YAML format (ENV.yaml) with explicit environment variable definitions. See Heuristic:Microsoft_Autogen_Warning_Deprecated_JSON_Env_Files.

The `run_cmd.py` module is the core execution engine for agbench, AutoGen's benchmarking framework. It orchestrates the execution of benchmark scenarios by:

Expanding scenario templates with substitutions
Managing Docker containers or native execution environments
Handling environment variables and configuration files
Running scenarios with timeout controls
Supporting parallel execution across multiple processes
Managing Azure authentication tokens for cloud-based models
Tracking execution results in organized directory structures

The module supports both Docker-based isolation (recommended for security) and native execution. It includes comprehensive error handling, logging, and retry mechanisms for robust benchmark execution.

Usage

This module is typically invoked via the agbench CLI tool to run benchmark scenarios. It processes JSONL files containing scenario definitions and executes them with specified repetitions and configurations.

Code Reference

Source Location

Repository: Microsoft_Autogen
File: python/packages/agbench/src/agbench/run_cmd.py
Lines: 1-1011

Signature

def run_scenarios(
    scenario: str,
    n_repeats: int,
    is_native: bool,
    config_file: Union[None, str],
    token_provider: Optional[Callable[[], str]],
    docker_image: Optional[str] = None,
    results_dir: str = "Results",
    subsample: Union[None, int, float] = None,
    env_file: Union[None, str] = None,
) -> None

def expand_scenario(
    scenario_dir: str,
    scenario: ScenarioInstance,
    output_dir: str,
    config_file: Union[str, None]
) -> None

def run_scenario_in_docker(
    work_dir: str,
    env: Dict[str, str],
    timeout: int = TASK_TIMEOUT,
    docker_image: Optional[str] = None
) -> None

def run_scenario_natively(
    work_dir: str,
    env: Dict[str, str],
    timeout: int = TASK_TIMEOUT
) -> None

class ScenarioInstance(TypedDict):
    id: str
    template: Union[str, List[Union[str, List[str]]]]
    substitutions: Dict[str, Dict[str, str]]
    values: Dict[str, Dict[str, str]]

Import

from agbench.run_cmd import run_scenarios, expand_scenario, run_cli

I/O Contract

Inputs

Name	Type	Required	Description
scenario	str	Yes	Path to JSONL file, directory of JSONL files, or "-" for stdin
n_repeats	int	Yes	Number of times to repeat each scenario instance
is_native	bool	Yes	Whether to run natively (True) or in Docker (False)
config_file	str	No	Path to configuration YAML file (default: config.yaml)
token_provider	Callable	No	Function that returns Azure authentication tokens
docker_image	str	No	Docker image to use (default: builds "agbench" image)
results_dir	str	No	Directory for storing results (default: "Results")
subsample	int/float	No	Proportion (0.0-1.0) or count of scenarios to run
env_file	str	No	Path to environment variables file (default: ENV.yaml)

Outputs

Name	Type	Description
Results directory	Directory	Organized hierarchy: Results/{scenario}/{instance_id}/{repetition}/
console_log.txt	File	Complete console output from scenario execution
timestamp.txt	File	Execution metadata including agbench version
scenario.py	File	Expanded scenario script ready for execution
Exit status	None	Function returns None; errors raise exceptions

Core Functions

run_scenarios

Main entry point that orchestrates scenario execution. Loads JSONL files, applies subsampling if specified, creates result directories, expands scenarios, and executes them either natively or in Docker.

expand_scenario

Expands scenario templates by:

Copying template files/directories to the output location
Performing string substitutions in templated files
Including global template resources
Copying configuration files

Supports multiple template formats including single files, directories, and complex mappings.

run_scenario_in_docker

Executes a scenario in a Docker container with:

Automatic image building if needed
Volume mounting for workspace and AutoGen repository
Docker socket mounting for Docker-in-Docker support
Streaming log capture with timeout handling
Environment variable injection
Graceful container cleanup

run_scenario_natively

Executes a scenario in a native Python virtual environment with:

Virtual environment isolation (.agbench_venv)
Requirement installation from requirements.txt
Timeout enforcement using shell timeout command
Environment variable propagation
Initialization and finalization script hooks

get_scenario_env

Constructs environment variable dictionary from:

System environment (OPENAI_API_KEY)
Azure authentication tokens
ENV.yaml or ENV.json files
Variable substitution using ${VAR_NAME} syntax

run_parallel

Splits JSONL scenarios and executes them across multiple worker processes using multiprocessing.Pool.

get_azure_token_provider

Attempts to create an Azure DefaultAzureCredential bearer token provider if ~/.azure directory exists and authentication succeeds.

Usage Examples

Basic Usage

from agbench.run_cmd import run_scenarios

# Run scenarios in Docker with 3 repetitions
run_scenarios(
    scenario="scenarios/test.jsonl",
    n_repeats=3,
    is_native=False,
    config_file="config.yaml",
    token_provider=None,
    docker_image=None,
    results_dir="Results"
)

CLI Usage

# Run all JSONL files in a directory with 5 repetitions
agbench run scenarios/ -r 5

# Run with custom Docker image and environment file
agbench run scenarios/test.jsonl -d custom-image:latest -e custom_env.yaml

# Run natively (use with caution)
agbench run scenarios/test.jsonl --native

# Run with parallel execution (4 workers)
agbench run scenarios/test.jsonl -p 4 -r 2

# Subsample 50% of scenarios
agbench run scenarios/test.jsonl -s 0.5

# Use Azure authentication
agbench run scenarios/test.jsonl -a

Scenario JSONL Format

{
  "id": "task_001",
  "template": "templates/basic_task",
  "substitutions": {
    "scenario.py": {
      "__TASK__": "Calculate fibonacci numbers",
      "__MODEL__": "gpt-4"
    }
  }
}

Environment File Format

# ENV.yaml
OPENAI_API_KEY: ${OPENAI_API_KEY}
AZURE_OPENAI_ENDPOINT: https://my-resource.openai.azure.com
MODEL_NAME: gpt-4

Constants

TASK_TIMEOUT: 7200 seconds (120 minutes)
DEFAULT_DOCKER_IMAGE_TAG: "agbench"
DEFAULT_ENV_FILE_YAML: "ENV.yaml"
DEFAULT_ENV_FILE_JSON: "ENV.json" (deprecated)
DEFAULT_CONFIG_YAML: "config.yaml"

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment