Implementation:CarperAI Trlx Sweep
| Knowledge Sources | |
|---|---|
| Domains | Hyperparameter_Optimization, Distributed_Training |
| Last Updated | 2026-02-07 16:00 GMT |
Overview
Concrete tool for orchestrating hyperparameter sweeps using Ray Tune with W&B logging and automatic report generation.
Description
The sweep module provides a CLI-driven hyperparameter sweep orchestrator. It parses a YAML config file defining search spaces and Ray Tune settings, translates the config into Ray Tune parameter distributions (uniform, loguniform, choice, grid, etc.), configures search algorithms (BayesOpt, BOHB, random) and schedulers (HyperBand, FIFO), launches experiments via AccelerateTrainer, and generates a W&B report with parallel coordinates, parameter importance, scatter plots, and metric line plots.
Usage
Use this module as a CLI entry point to run hyperparameter sweeps over any trlx training script. Requires a YAML sweep configuration file specifying the parameter search space and a Ray Tune configuration block.
Code Reference
Source Location
- Repository: CarperAI_Trlx
- File: trlx/sweep.py
- Lines: 1-348
Signature
def get_param_space(config: dict) -> dict:
"""Convert YAML search space config to Ray Tune parameter distributions."""
def get_search_alg(tune_config: dict):
"""Create a Ray Tune search algorithm from config (bayesopt, bohb, random)."""
def get_scheduler(tune_config: dict):
"""Create a Ray Tune scheduler from config (hyperband, fifo, etc.)."""
def get_tune_config(tune_config: dict) -> ray.tune.TuneConfig:
"""Build Ray TuneConfig with search algorithm, scheduler, and trial count."""
def create_report(
target_metric: str,
column_names: list,
entity_name: str,
project_name: str,
group_name: str,
best_config: dict,
) -> None:
"""Generate a W&B report with parallel coordinates, scatter plots, and line charts."""
Import
# CLI usage:
# python -m trlx.sweep examples/ppo_sentiments.py --config configs/sweeps/ppo_sweep.yml
# Programmatic usage:
from trlx.sweep import get_param_space, get_tune_config, create_report
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| script | str (CLI) | Yes | Path to the training script to sweep |
| --config | str (CLI) | Yes | Path to YAML sweep configuration file |
| --num_gpus | int (CLI) | No | GPUs per trial (default 1) |
| --num_cpus | int (CLI) | No | CPUs per trial (default 8) |
| --accelerate_config | str (CLI) | No | Path to Accelerate config |
| --server_address | str (CLI) | No | Ray cluster address |
Outputs
| Name | Type | Description |
|---|---|---|
| Ray Tune results | ResultGrid | Sweep trial results with metrics |
| W&B report | URL | Generated report with visualization panels |
| Best config | dict | Best hyperparameter configuration found |
Usage Examples
Run a PPO Sweep from CLI
# Run a hyperparameter sweep over PPO sentiments example
# python -m trlx.sweep examples/ppo_sentiments.py \
# --config configs/sweeps/ppo_sweep.yml \
# --num_gpus 1 \
# --num_cpus 8
YAML Sweep Config Format
# configs/sweeps/ppo_sweep.yml
tune_config:
search_alg: bayesopt
scheduler: hyperband
num_samples: 20
metric: reward/mean
mode: max
param_space:
train.learning_rate_init:
method: loguniform
bounds: [1.0e-6, 1.0e-4]
method.init_kl_coef:
method: uniform
bounds: [0.01, 0.5]
method.target:
method: choice
values: [3.0, 6.0, 12.0]