Implementation:Isaac sim IsaacGymEnvs CommonAgent Train
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Training |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
API documentation for the CommonAgent class, which implements the PPO training loop for GPU-accelerated Isaac Gym environments.
Description
CommonAgent extends rl_games' a2c_continuous.A2CAgent to handle GPU-resident tensor observations directly, bypassing the default CPU-based data pipeline. It implements the full PPO training cycle: rollout collection via play_steps(), advantage estimation, and multi-epoch mini-batch gradient updates via calc_gradients(). The train() method orchestrates the outer loop, calling train_epoch() repeatedly until the target number of frames or epochs is reached.
Type
API Doc
Usage
Use this implementation when training RL policies on any IsaacGymEnvs task. The CommonAgent is instantiated automatically by the rl_games Runner when the registered agent type matches.
Code Reference
Source Location
- Repository: IsaacGymEnvs
- File: isaacgymenvs/learning/common_agent.py (Lines 54-527)
Signature
class CommonAgent(a2c_continuous.A2CAgent):
"""PPO agent customized for GPU-accelerated Isaac Gym environments."""
def __init__(self, base_name, params):
"""Initialize agent with environment, network, optimizer, and rollout buffers."""
...
def train(self): # L111-181
"""Main training loop. Runs train_epoch() until convergence criteria are met.
Handles checkpointing, learning rate scheduling, and early stopping."""
...
def train_epoch(self): # L183-248
"""Single training epoch: collect rollout, compute GAE, perform PPO updates.
Returns a dictionary of training statistics (losses, rewards, FPS)."""
...
def play_steps(self): # L250-310
"""Collect horizon_length steps of experience from the vectorized environment.
Stores observations, actions, rewards, dones, values, and log_probs
in GPU-resident tensor buffers."""
...
def calc_gradients(self, input_dict): # L312-404
"""Compute PPO loss and perform gradient update for one mini-batch.
Includes clipped surrogate loss, value loss, and entropy bonus."""
...
Import
from isaacgymenvs.learning.common_agent import CommonAgent
I/O Contract
Key Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| horizon_length | int | 16 | Number of environment steps per rollout before a policy update |
| mini_epochs_num | int | 4 | Number of passes over the rollout data during each PPO update |
| gamma | float | 0.99 | Discount factor for future rewards |
| tau | float | 0.95 | GAE lambda parameter controlling bias-variance tradeoff |
| e_clip | float | 0.2 | PPO clipping parameter for the surrogate objective |
| learning_rate | float | 3e-4 | Initial learning rate for the Adam optimizer |
| entropy_coef | float | 0.0 | Coefficient for the entropy bonus in the total loss |
| value_loss_coef | float | 1.0 | Coefficient for the value function loss in the total loss |
| max_epochs | int | - | Maximum number of training epochs before stopping |
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| env | RLGPUEnv | Yes | Vectorized Isaac Gym environment providing GPU-resident observations |
| model | nn.Module | Yes | Policy and value network (actor-critic architecture) |
| optimizer | torch.optim.Optimizer | Yes | Optimizer for gradient updates (typically Adam) |
| base_name | str | Yes | Experiment name used for checkpoint and log directory naming |
| params | dict | Yes | Full rl_games agent configuration dictionary |
Outputs
| Name | Type | Description |
|---|---|---|
| Checkpoints | .pth files | Saved model state_dict files in runs/<experiment>/nn/ directory
|
| TensorBoard logs | event files | Training metrics (rewards, losses, FPS) logged to runs/<experiment>/summaries/
|
| Training stats | dict | Per-epoch dictionary containing mean_rewards, policy_loss, value_loss, entropy, fps
|
Usage Examples
How CommonAgent is Instantiated and Run by the Runner
# The Runner instantiates CommonAgent automatically based on config.
# This example shows the equivalent manual setup for clarity.
from isaacgymenvs.learning.common_agent import CommonAgent
# params dict is built by the Runner from the rl_games YAML config
params = {
'config': {
'horizon_length': 16,
'mini_epochs_num': 4,
'gamma': 0.99,
'tau': 0.95,
'e_clip': 0.2,
'learning_rate': 3e-4,
'entropy_coef': 0.0,
'max_epochs': 1000,
'num_actors': 4096,
'save_frequency': 100,
...
},
'network': { ... }, # Network architecture config
}
agent = CommonAgent(base_name='Ant', params=params)
agent.init_tensors() # Allocate GPU rollout buffers
agent.train() # Run the full training loop
Training Loop Flow
# Pseudocode showing the internal flow of train() -> train_epoch()
def train(self):
while self.epoch_num < self.max_epochs:
train_info = self.train_epoch()
# Log metrics
self.writer.add_scalar('rewards/mean', train_info['mean_rewards'], self.epoch_num)
self.writer.add_scalar('losses/policy', train_info['policy_loss'], self.epoch_num)
# Checkpoint
if self.epoch_num % self.save_freq == 0:
self.save_checkpoint(os.path.join(self.nn_dir, f'ep_{self.epoch_num}.pth'))
self.epoch_num += 1
def train_epoch(self):
# 1. Collect rollout
rollout = self.play_steps()
# 2. Compute GAE advantages
advantages = self.compute_gae(rollout)
# 3. PPO mini-batch updates
for _ in range(self.mini_epochs_num):
for batch in self.get_mini_batches(rollout, advantages):
self.calc_gradients(batch)
return {'mean_rewards': ..., 'policy_loss': ..., 'value_loss': ..., 'fps': ...}
Related Pages
- Isaac_sim_IsaacGymEnvs_Policy_Training_Loop - implements - Principle describing the PPO training loop and GAE.
- Isaac_sim_IsaacGymEnvs_Rl_Games_Runner_Integration - prerequisite - Runner initialization that creates the CommonAgent.
- Isaac_sim_IsaacGymEnvs_WandbAlgoObserver_Logging - related - Observer that logs metrics emitted during training.