Implementation:Microsoft Autogen Agbench Run Cmd
| Knowledge Sources | |
|---|---|
| Domains | Benchmarking, Code_Execution, Docker, Testing |
| Last Updated | 2026-02-11 17:00 GMT |
Overview
A comprehensive benchmark execution module for running AutoGen scenarios in isolated Docker or native environments with support for parallel execution, environment management, and results tracking.
Description
⚠️ Deprecation Notice: JSON environment files (ENV.json) and implicit OPENAI_API_KEY inclusion are deprecated. Migrate to YAML format (ENV.yaml) with explicit environment variable definitions. See Heuristic:Microsoft_Autogen_Warning_Deprecated_JSON_Env_Files.
The `run_cmd.py` module is the core execution engine for agbench, AutoGen's benchmarking framework. It orchestrates the execution of benchmark scenarios by:
- Expanding scenario templates with substitutions
- Managing Docker containers or native execution environments
- Handling environment variables and configuration files
- Running scenarios with timeout controls
- Supporting parallel execution across multiple processes
- Managing Azure authentication tokens for cloud-based models
- Tracking execution results in organized directory structures
The module supports both Docker-based isolation (recommended for security) and native execution. It includes comprehensive error handling, logging, and retry mechanisms for robust benchmark execution.
Usage
This module is typically invoked via the agbench CLI tool to run benchmark scenarios. It processes JSONL files containing scenario definitions and executes them with specified repetitions and configurations.
Code Reference
Source Location
- Repository: Microsoft_Autogen
- File: python/packages/agbench/src/agbench/run_cmd.py
- Lines: 1-1011
Signature
def run_scenarios(
scenario: str,
n_repeats: int,
is_native: bool,
config_file: Union[None, str],
token_provider: Optional[Callable[[], str]],
docker_image: Optional[str] = None,
results_dir: str = "Results",
subsample: Union[None, int, float] = None,
env_file: Union[None, str] = None,
) -> None
def expand_scenario(
scenario_dir: str,
scenario: ScenarioInstance,
output_dir: str,
config_file: Union[str, None]
) -> None
def run_scenario_in_docker(
work_dir: str,
env: Dict[str, str],
timeout: int = TASK_TIMEOUT,
docker_image: Optional[str] = None
) -> None
def run_scenario_natively(
work_dir: str,
env: Dict[str, str],
timeout: int = TASK_TIMEOUT
) -> None
class ScenarioInstance(TypedDict):
id: str
template: Union[str, List[Union[str, List[str]]]]
substitutions: Dict[str, Dict[str, str]]
values: Dict[str, Dict[str, str]]
Import
from agbench.run_cmd import run_scenarios, expand_scenario, run_cli
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| scenario | str | Yes | Path to JSONL file, directory of JSONL files, or "-" for stdin |
| n_repeats | int | Yes | Number of times to repeat each scenario instance |
| is_native | bool | Yes | Whether to run natively (True) or in Docker (False) |
| config_file | str | No | Path to configuration YAML file (default: config.yaml) |
| token_provider | Callable | No | Function that returns Azure authentication tokens |
| docker_image | str | No | Docker image to use (default: builds "agbench" image) |
| results_dir | str | No | Directory for storing results (default: "Results") |
| subsample | int/float | No | Proportion (0.0-1.0) or count of scenarios to run |
| env_file | str | No | Path to environment variables file (default: ENV.yaml) |
Outputs
| Name | Type | Description |
|---|---|---|
| Results directory | Directory | Organized hierarchy: Results/{scenario}/{instance_id}/{repetition}/ |
| console_log.txt | File | Complete console output from scenario execution |
| timestamp.txt | File | Execution metadata including agbench version |
| scenario.py | File | Expanded scenario script ready for execution |
| Exit status | None | Function returns None; errors raise exceptions |
Core Functions
run_scenarios
Main entry point that orchestrates scenario execution. Loads JSONL files, applies subsampling if specified, creates result directories, expands scenarios, and executes them either natively or in Docker.
expand_scenario
Expands scenario templates by:
- Copying template files/directories to the output location
- Performing string substitutions in templated files
- Including global template resources
- Copying configuration files
Supports multiple template formats including single files, directories, and complex mappings.
run_scenario_in_docker
Executes a scenario in a Docker container with:
- Automatic image building if needed
- Volume mounting for workspace and AutoGen repository
- Docker socket mounting for Docker-in-Docker support
- Streaming log capture with timeout handling
- Environment variable injection
- Graceful container cleanup
run_scenario_natively
Executes a scenario in a native Python virtual environment with:
- Virtual environment isolation (.agbench_venv)
- Requirement installation from requirements.txt
- Timeout enforcement using shell timeout command
- Environment variable propagation
- Initialization and finalization script hooks
get_scenario_env
Constructs environment variable dictionary from:
- System environment (OPENAI_API_KEY)
- Azure authentication tokens
- ENV.yaml or ENV.json files
- Variable substitution using ${VAR_NAME} syntax
run_parallel
Splits JSONL scenarios and executes them across multiple worker processes using multiprocessing.Pool.
get_azure_token_provider
Attempts to create an Azure DefaultAzureCredential bearer token provider if ~/.azure directory exists and authentication succeeds.
Usage Examples
Basic Usage
from agbench.run_cmd import run_scenarios
# Run scenarios in Docker with 3 repetitions
run_scenarios(
scenario="scenarios/test.jsonl",
n_repeats=3,
is_native=False,
config_file="config.yaml",
token_provider=None,
docker_image=None,
results_dir="Results"
)
CLI Usage
# Run all JSONL files in a directory with 5 repetitions
agbench run scenarios/ -r 5
# Run with custom Docker image and environment file
agbench run scenarios/test.jsonl -d custom-image:latest -e custom_env.yaml
# Run natively (use with caution)
agbench run scenarios/test.jsonl --native
# Run with parallel execution (4 workers)
agbench run scenarios/test.jsonl -p 4 -r 2
# Subsample 50% of scenarios
agbench run scenarios/test.jsonl -s 0.5
# Use Azure authentication
agbench run scenarios/test.jsonl -a
Scenario JSONL Format
{
"id": "task_001",
"template": "templates/basic_task",
"substitutions": {
"scenario.py": {
"__TASK__": "Calculate fibonacci numbers",
"__MODEL__": "gpt-4"
}
}
}
Environment File Format
# ENV.yaml
OPENAI_API_KEY: ${OPENAI_API_KEY}
AZURE_OPENAI_ENDPOINT: https://my-resource.openai.azure.com
MODEL_NAME: gpt-4
Constants
- TASK_TIMEOUT: 7200 seconds (120 minutes)
- DEFAULT_DOCKER_IMAGE_TAG: "agbench"
- DEFAULT_ENV_FILE_YAML: "ENV.yaml"
- DEFAULT_ENV_FILE_JSON: "ENV.json" (deprecated)
- DEFAULT_CONFIG_YAML: "config.yaml"