Implementation:Isaac sim IsaacGymEnvs CommonAgent Train

Knowledge Sources	PPO GAE
Domains	Reinforcement_Learning, Training
Last Updated	2026-02-15 00:00 GMT

Overview

API documentation for the CommonAgent class, which implements the PPO training loop for GPU-accelerated Isaac Gym environments.

Description

CommonAgent extends rl_games' a2c_continuous.A2CAgent to handle GPU-resident tensor observations directly, bypassing the default CPU-based data pipeline. It implements the full PPO training cycle: rollout collection via play_steps(), advantage estimation, and multi-epoch mini-batch gradient updates via calc_gradients(). The train() method orchestrates the outer loop, calling train_epoch() repeatedly until the target number of frames or epochs is reached.

Type

API Doc

Usage

Use this implementation when training RL policies on any IsaacGymEnvs task. The CommonAgent is instantiated automatically by the rl_games Runner when the registered agent type matches.

Code Reference

Source Location

Repository: IsaacGymEnvs
File: isaacgymenvs/learning/common_agent.py (Lines 54-527)

Signature

class CommonAgent(a2c_continuous.A2CAgent):
    """PPO agent customized for GPU-accelerated Isaac Gym environments."""

    def __init__(self, base_name, params):
        """Initialize agent with environment, network, optimizer, and rollout buffers."""
        ...

    def train(self):  # L111-181
        """Main training loop. Runs train_epoch() until convergence criteria are met.
        Handles checkpointing, learning rate scheduling, and early stopping."""
        ...

    def train_epoch(self):  # L183-248
        """Single training epoch: collect rollout, compute GAE, perform PPO updates.
        Returns a dictionary of training statistics (losses, rewards, FPS)."""
        ...

    def play_steps(self):  # L250-310
        """Collect horizon_length steps of experience from the vectorized environment.
        Stores observations, actions, rewards, dones, values, and log_probs
        in GPU-resident tensor buffers."""
        ...

    def calc_gradients(self, input_dict):  # L312-404
        """Compute PPO loss and perform gradient update for one mini-batch.
        Includes clipped surrogate loss, value loss, and entropy bonus."""
        ...

Import

from isaacgymenvs.learning.common_agent import CommonAgent

I/O Contract

Key Parameters

Parameter	Type	Default	Description
horizon_length	int	16	Number of environment steps per rollout before a policy update
mini_epochs_num	int	4	Number of passes over the rollout data during each PPO update
gamma	float	0.99	Discount factor for future rewards
tau	float	0.95	GAE lambda parameter controlling bias-variance tradeoff
e_clip	float	0.2	PPO clipping parameter for the surrogate objective
learning_rate	float	3e-4	Initial learning rate for the Adam optimizer
entropy_coef	float	0.0	Coefficient for the entropy bonus in the total loss
value_loss_coef	float	1.0	Coefficient for the value function loss in the total loss
max_epochs	int	-	Maximum number of training epochs before stopping

Inputs

Name	Type	Required	Description
env	RLGPUEnv	Yes	Vectorized Isaac Gym environment providing GPU-resident observations
model	nn.Module	Yes	Policy and value network (actor-critic architecture)
optimizer	torch.optim.Optimizer	Yes	Optimizer for gradient updates (typically Adam)
base_name	str	Yes	Experiment name used for checkpoint and log directory naming
params	dict	Yes	Full rl_games agent configuration dictionary

Outputs

Name	Type	Description
Checkpoints	.pth files	Saved model state_dict files in `runs/<experiment>/nn/` directory
TensorBoard logs	event files	Training metrics (rewards, losses, FPS) logged to `runs/<experiment>/summaries/`
Training stats	dict	Per-epoch dictionary containing `mean_rewards`, `policy_loss`, `value_loss`, `entropy`, `fps`

Usage Examples

How CommonAgent is Instantiated and Run by the Runner

# The Runner instantiates CommonAgent automatically based on config.
# This example shows the equivalent manual setup for clarity.

from isaacgymenvs.learning.common_agent import CommonAgent

# params dict is built by the Runner from the rl_games YAML config
params = {
    'config': {
        'horizon_length': 16,
        'mini_epochs_num': 4,
        'gamma': 0.99,
        'tau': 0.95,
        'e_clip': 0.2,
        'learning_rate': 3e-4,
        'entropy_coef': 0.0,
        'max_epochs': 1000,
        'num_actors': 4096,
        'save_frequency': 100,
        ...
    },
    'network': { ... },  # Network architecture config
}

agent = CommonAgent(base_name='Ant', params=params)
agent.init_tensors()  # Allocate GPU rollout buffers
agent.train()  # Run the full training loop

Training Loop Flow

# Pseudocode showing the internal flow of train() -> train_epoch()

def train(self):
    while self.epoch_num < self.max_epochs:
        train_info = self.train_epoch()

        # Log metrics
        self.writer.add_scalar('rewards/mean', train_info['mean_rewards'], self.epoch_num)
        self.writer.add_scalar('losses/policy', train_info['policy_loss'], self.epoch_num)

        # Checkpoint
        if self.epoch_num % self.save_freq == 0:
            self.save_checkpoint(os.path.join(self.nn_dir, f'ep_{self.epoch_num}.pth'))

        self.epoch_num += 1

def train_epoch(self):
    # 1. Collect rollout
    rollout = self.play_steps()

    # 2. Compute GAE advantages
    advantages = self.compute_gae(rollout)

    # 3. PPO mini-batch updates
    for _ in range(self.mini_epochs_num):
        for batch in self.get_mini_batches(rollout, advantages):
            self.calc_gradients(batch)

    return {'mean_rewards': ..., 'policy_loss': ..., 'value_loss': ..., 'fps': ...}

Related Pages

Isaac_sim_IsaacGymEnvs_Policy_Training_Loop - implements - Principle describing the PPO training loop and GAE.
Isaac_sim_IsaacGymEnvs_Rl_Games_Runner_Integration - prerequisite - Runner initialization that creates the CommonAgent.
Isaac_sim_IsaacGymEnvs_WandbAlgoObserver_Logging - related - Observer that logs metrics emitted during training.

Principle:Isaac_sim_IsaacGymEnvs_Policy_Training_Loop

Environment:Isaac_sim_IsaacGymEnvs_Python_CUDA_Runtime

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment