Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:CarperAI Trlx Sweep

From Leeroopedia


Knowledge Sources
Domains Hyperparameter_Optimization, Distributed_Training
Last Updated 2026-02-07 16:00 GMT

Overview

Concrete tool for orchestrating hyperparameter sweeps using Ray Tune with W&B logging and automatic report generation.

Description

The sweep module provides a CLI-driven hyperparameter sweep orchestrator. It parses a YAML config file defining search spaces and Ray Tune settings, translates the config into Ray Tune parameter distributions (uniform, loguniform, choice, grid, etc.), configures search algorithms (BayesOpt, BOHB, random) and schedulers (HyperBand, FIFO), launches experiments via AccelerateTrainer, and generates a W&B report with parallel coordinates, parameter importance, scatter plots, and metric line plots.

Usage

Use this module as a CLI entry point to run hyperparameter sweeps over any trlx training script. Requires a YAML sweep configuration file specifying the parameter search space and a Ray Tune configuration block.

Code Reference

Source Location

Signature

def get_param_space(config: dict) -> dict:
    """Convert YAML search space config to Ray Tune parameter distributions."""

def get_search_alg(tune_config: dict):
    """Create a Ray Tune search algorithm from config (bayesopt, bohb, random)."""

def get_scheduler(tune_config: dict):
    """Create a Ray Tune scheduler from config (hyperband, fifo, etc.)."""

def get_tune_config(tune_config: dict) -> ray.tune.TuneConfig:
    """Build Ray TuneConfig with search algorithm, scheduler, and trial count."""

def create_report(
    target_metric: str,
    column_names: list,
    entity_name: str,
    project_name: str,
    group_name: str,
    best_config: dict,
) -> None:
    """Generate a W&B report with parallel coordinates, scatter plots, and line charts."""

Import

# CLI usage:
# python -m trlx.sweep examples/ppo_sentiments.py --config configs/sweeps/ppo_sweep.yml

# Programmatic usage:
from trlx.sweep import get_param_space, get_tune_config, create_report

I/O Contract

Inputs

Name Type Required Description
script str (CLI) Yes Path to the training script to sweep
--config str (CLI) Yes Path to YAML sweep configuration file
--num_gpus int (CLI) No GPUs per trial (default 1)
--num_cpus int (CLI) No CPUs per trial (default 8)
--accelerate_config str (CLI) No Path to Accelerate config
--server_address str (CLI) No Ray cluster address

Outputs

Name Type Description
Ray Tune results ResultGrid Sweep trial results with metrics
W&B report URL Generated report with visualization panels
Best config dict Best hyperparameter configuration found

Usage Examples

Run a PPO Sweep from CLI

# Run a hyperparameter sweep over PPO sentiments example
# python -m trlx.sweep examples/ppo_sentiments.py \
#     --config configs/sweeps/ppo_sweep.yml \
#     --num_gpus 1 \
#     --num_cpus 8

YAML Sweep Config Format

# configs/sweeps/ppo_sweep.yml
tune_config:
  search_alg: bayesopt
  scheduler: hyperband
  num_samples: 20
  metric: reward/mean
  mode: max

param_space:
  train.learning_rate_init:
    method: loguniform
    bounds: [1.0e-6, 1.0e-4]
  method.init_kl_coef:
    method: uniform
    bounds: [0.01, 0.5]
  method.target:
    method: choice
    values: [3.0, 6.0, 12.0]

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment