Workflow:HKUDS AI Trader Multi Agent Comparison

Knowledge Sources	AI-Trader LangChain Agents
Domains	LLM_Ops, Model_Evaluation, Algorithmic_Trading
Last Updated	2026-02-09 14:00 GMT

Overview

Process for running multiple LLM trading agents in parallel across the same date range and market, then comparing their performance using standardized financial metrics and visualization tools.

Description

This workflow enables benchmarking different LLM models as trading agents. Using the parallel runner, multiple models (e.g., GPT, Claude, DeepSeek, Qwen, Gemini) can be configured in a single JSON config file and executed simultaneously. Each model runs its own independent agent instance with isolated position tracking, logging, and runtime configuration. After all agents complete, performance metrics are calculated for each and compared using the built-in metrics calculator and plotting tools.

Usage

Execute this workflow when you want to evaluate and compare how different LLM models perform as trading agents under identical market conditions. You need a configuration file with multiple models enabled, sufficient API credits for all models, and the data pipeline must have already been run to populate the merged.jsonl price data.

Execution Steps

Step 1: Configure Multiple Models

Create or edit a JSON configuration file that defines multiple LLM models to compare. Each model entry specifies its name, base model identifier, a unique signature (used for log isolation), API endpoint, and an enabled flag. All models share the same date range, initial cash, and agent parameters to ensure a fair comparison.

Key considerations:

Each model must have a unique signature to isolate its position files and logs
API endpoints and keys can be specified per-model or inherited from environment variables
Only models with "enabled": true are included in the run
The agent_type field determines which agent class all models use (e.g., BaseAgent for US stocks)

Step 2: Launch Parallel Agent Runner

Execute the parallel runner script which reads the configuration and spawns independent subprocesses for each enabled model. When multiple models are enabled, each model runs in its own Python subprocess with isolated runtime environment paths and position files. When only one model is enabled, it runs in the current process.

Key considerations:

Each subprocess gets its own RUNTIME_ENV_PATH to prevent configuration collisions
Subprocesses run fully independently and can utilize separate API rate limits
All subprocesses are awaited concurrently using asyncio.gather
The parallel runner can be invoked with a --signature flag to run a single model (used internally by subprocesses)

Step 3: Monitor Execution

Track the progress of each agent subprocess as it processes trading days. Each agent logs its activity to its own signature-specific directory structure, including daily conversation logs and position updates. The parallel runner reports when all subprocesses have completed.

What happens:

Each agent independently iterates through the date range, processing one trading day at a time
Position files (position.jsonl) are updated after each trading day per agent
Conversation logs are stored in per-day directories under each agent's signature path

Step 4: Calculate Performance Metrics

After all agents complete, run the metrics calculator on each agent's position file. The calculator loads position history and market price data, computes portfolio values at each timestamp, and derives standardized financial metrics including Cumulative Return, Sortino Ratio, Sharpe Ratio, Volatility, Maximum Drawdown, Calmar Ratio, and Win Rate.

Key considerations:

The calculator auto-detects market type (stock, A-share, crypto) from position symbols
Annualization factors differ by market: 252 days for stocks, 365 for crypto, ~1638 for hourly
Results are saved as JSON (performance_metrics.json) and CSV (portfolio_values.csv) per agent
Missing price data generates warnings but does not halt calculation

Step 5: Compare and Visualize Results

Use the plotting tools to generate comparative visualizations across all agents. The multi-agent metrics plotter reads position data from multiple agent directories and produces charts comparing portfolio value curves, drawdown profiles, and metric summaries. Results can also be viewed through the web dashboard.

Key considerations:

The plot tool accepts multiple agent signatures for side-by-side comparison
The web frontend loads pre-computed cache data for interactive exploration
Cache regeneration may be needed after new runs to update the dashboard

Execution Diagram

GitHub URL

Workflow Repository