Implementation:Isaac sim IsaacGymEnvs PBT Engine
| Knowledge Sources | |
|---|---|
| Domains | Hyperparameter_Optimization, Evolutionary_Computing |
| Last Updated | 2026-02-15 11:00 GMT |
Overview
The PBT (Population-Based Training) engine implements evolutionary hyperparameter optimization by running a population of RL policies in parallel, periodically replacing the worst performers with mutated copies of the best.
Description
The PBT system in isaacgymenvs/pbt/pbt.py consists of three main components: PbtParams (a configuration container), PbtAlgoObserver (the core PBT logic as an rl_games AlgoObserver), and the initial_pbt_check() entry point function. The system operates by running multiple independent training processes (one per policy), each with its own hyperparameters. At regular intervals defined by interval_steps, each process checkpoints its current state and evaluates the population.
PbtAlgoObserver hooks into the rl_games training loop via the AlgoObserver interface. Its process_infos() method tracks per-environment objective values (either a custom "true_objective" from the environment or the default mean reward). The after_steps() method implements the core PBT logic: it saves a checkpoint of the current policy, loads checkpoint data from all other policies in the population, ranks them by objective value, and decides whether the current policy should be replaced. If the current policy ranks in the worst fraction, it is replaced by a mutated copy of one of the best policies, and the process restarts itself via os.execv() with new hyperparameters.
The mutation logic (in a separate mutation.py module) applies random perturbations to mutable hyperparameters within configured bounds. The helper function _restart_process_with_new_params() reconstructs the command-line arguments with updated parameters and re-executes the Python process. This design allows PBT to work with any IsaacGymEnvs task without modifying the underlying RL algorithm.
Usage
Use the PBT engine when you want to automatically tune hyperparameters (e.g., learning rate, entropy coefficient, reward scales) during training. It requires launching multiple training processes, typically one per GPU or one per policy index, each configured with pbt.enabled=True and a unique pbt.policy_idx.
Code Reference
Source Location
- Repository: IsaacGymEnvs
- File: isaacgymenvs/pbt/pbt.py
- Lines: 1-692
Signature
class PbtParams:
def __init__(self, cfg: DictConfig):
"""Parse PBT configuration: replace_fraction_best/worst, mutation_rate,
change_min/max, interval_steps, start_after, workspace, params_to_mutate."""
class PbtAlgoObserver(AlgoObserver):
def __init__(self, cfg: DictConfig):
"""Initialize PBT observer with policy index, workspace, and tracking state."""
def after_init(self, algo):
"""Store reference to RL algorithm and create workspace directories."""
def process_infos(self, infos, done_indices):
"""Track per-environment objective values from info dict."""
def after_steps(self):
"""Main PBT logic: checkpoint, load population, rank, replace worst with mutated best."""
def _save_pbt_checkpoint(self):
"""Save current policy checkpoint and objective value to workspace."""
def _load_population_checkpoints(self):
"""Load checkpoint metadata from all policies in the population."""
def _cleanup(self):
"""Remove old checkpoint files from workspace."""
def initial_pbt_check(cfg: DictConfig):
"""Entry point: on first run, mutate initial hyperparameters and restart."""
def _restart_process_with_new_params(policy_idx, new_params, restart_from_checkpoint,
experiment_name, algo, with_wandb):
"""Restart the current process via os.execv with updated hyperparameters."""
Import
from isaacgymenvs.pbt.pbt import PbtAlgoObserver, PbtParams, initial_pbt_check
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| cfg | DictConfig | Yes | Hydra configuration containing the pbt section with enabled, policy_idx, num_policies, workspace, interval_steps, start_after, initial_delay, mutation, replace_fraction_best, replace_fraction_worst, mutation_rate, change_min, change_max |
| infos | dict | Yes | Environment info dict, optionally containing "true_objective" tensor |
| done_indices | Tensor | Yes | Indices of environments that completed an episode |
| algo | RLAlgo | Yes | Reference to the rl_games algorithm instance (passed via after_init) |
Outputs
| Name | Type | Description |
|---|---|---|
| checkpoint files | YAML + PTH | Per-iteration checkpoint files written to the PBT workspace directory containing objective values and model weights |
| process restart | side effect | When a policy is replaced, the process restarts itself with mutated hyperparameters via os.execv |
Usage Examples
# Enable PBT in your Hydra config:
# pbt:
# enabled: True
# policy_idx: ${pbt_policy_idx:0}
# num_policies: 8
# workspace: "pbt_workspace"
# interval_steps: 10000000
# start_after: 10000000
# initial_delay: 0
# replace_fraction_best: 0.3
# replace_fraction_worst: 0.2
# mutation_rate: 0.8
# change_min: 0.8
# change_max: 1.2
# mutation:
# train.params.config.learning_rate: float
# train.params.config.entropy_coef: float
# In train.py, the observer is created and attached:
from isaacgymenvs.pbt.pbt import PbtAlgoObserver, initial_pbt_check
if cfg.pbt.enabled:
initial_pbt_check(cfg)
pbt_observer = PbtAlgoObserver(cfg)
runner.algo_observer = pbt_observer
# Launch multiple policies:
# python train.py pbt.policy_idx=0 &
# python train.py pbt.policy_idx=1 &
# ...
# python train.py pbt.policy_idx=7 &