Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Isaac sim IsaacGymEnvs CommonAgent Train

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Training
Last Updated 2026-02-15 00:00 GMT

Overview

API documentation for the CommonAgent class, which implements the PPO training loop for GPU-accelerated Isaac Gym environments.

Description

CommonAgent extends rl_games' a2c_continuous.A2CAgent to handle GPU-resident tensor observations directly, bypassing the default CPU-based data pipeline. It implements the full PPO training cycle: rollout collection via play_steps(), advantage estimation, and multi-epoch mini-batch gradient updates via calc_gradients(). The train() method orchestrates the outer loop, calling train_epoch() repeatedly until the target number of frames or epochs is reached.

Type

API Doc

Usage

Use this implementation when training RL policies on any IsaacGymEnvs task. The CommonAgent is instantiated automatically by the rl_games Runner when the registered agent type matches.

Code Reference

Source Location

Signature

class CommonAgent(a2c_continuous.A2CAgent):
    """PPO agent customized for GPU-accelerated Isaac Gym environments."""

    def __init__(self, base_name, params):
        """Initialize agent with environment, network, optimizer, and rollout buffers."""
        ...

    def train(self):  # L111-181
        """Main training loop. Runs train_epoch() until convergence criteria are met.
        Handles checkpointing, learning rate scheduling, and early stopping."""
        ...

    def train_epoch(self):  # L183-248
        """Single training epoch: collect rollout, compute GAE, perform PPO updates.
        Returns a dictionary of training statistics (losses, rewards, FPS)."""
        ...

    def play_steps(self):  # L250-310
        """Collect horizon_length steps of experience from the vectorized environment.
        Stores observations, actions, rewards, dones, values, and log_probs
        in GPU-resident tensor buffers."""
        ...

    def calc_gradients(self, input_dict):  # L312-404
        """Compute PPO loss and perform gradient update for one mini-batch.
        Includes clipped surrogate loss, value loss, and entropy bonus."""
        ...

Import

from isaacgymenvs.learning.common_agent import CommonAgent

I/O Contract

Key Parameters

Parameter Type Default Description
horizon_length int 16 Number of environment steps per rollout before a policy update
mini_epochs_num int 4 Number of passes over the rollout data during each PPO update
gamma float 0.99 Discount factor for future rewards
tau float 0.95 GAE lambda parameter controlling bias-variance tradeoff
e_clip float 0.2 PPO clipping parameter for the surrogate objective
learning_rate float 3e-4 Initial learning rate for the Adam optimizer
entropy_coef float 0.0 Coefficient for the entropy bonus in the total loss
value_loss_coef float 1.0 Coefficient for the value function loss in the total loss
max_epochs int - Maximum number of training epochs before stopping

Inputs

Name Type Required Description
env RLGPUEnv Yes Vectorized Isaac Gym environment providing GPU-resident observations
model nn.Module Yes Policy and value network (actor-critic architecture)
optimizer torch.optim.Optimizer Yes Optimizer for gradient updates (typically Adam)
base_name str Yes Experiment name used for checkpoint and log directory naming
params dict Yes Full rl_games agent configuration dictionary

Outputs

Name Type Description
Checkpoints .pth files Saved model state_dict files in runs/<experiment>/nn/ directory
TensorBoard logs event files Training metrics (rewards, losses, FPS) logged to runs/<experiment>/summaries/
Training stats dict Per-epoch dictionary containing mean_rewards, policy_loss, value_loss, entropy, fps

Usage Examples

How CommonAgent is Instantiated and Run by the Runner

# The Runner instantiates CommonAgent automatically based on config.
# This example shows the equivalent manual setup for clarity.

from isaacgymenvs.learning.common_agent import CommonAgent

# params dict is built by the Runner from the rl_games YAML config
params = {
    'config': {
        'horizon_length': 16,
        'mini_epochs_num': 4,
        'gamma': 0.99,
        'tau': 0.95,
        'e_clip': 0.2,
        'learning_rate': 3e-4,
        'entropy_coef': 0.0,
        'max_epochs': 1000,
        'num_actors': 4096,
        'save_frequency': 100,
        ...
    },
    'network': { ... },  # Network architecture config
}

agent = CommonAgent(base_name='Ant', params=params)
agent.init_tensors()  # Allocate GPU rollout buffers
agent.train()  # Run the full training loop

Training Loop Flow

# Pseudocode showing the internal flow of train() -> train_epoch()

def train(self):
    while self.epoch_num < self.max_epochs:
        train_info = self.train_epoch()

        # Log metrics
        self.writer.add_scalar('rewards/mean', train_info['mean_rewards'], self.epoch_num)
        self.writer.add_scalar('losses/policy', train_info['policy_loss'], self.epoch_num)

        # Checkpoint
        if self.epoch_num % self.save_freq == 0:
            self.save_checkpoint(os.path.join(self.nn_dir, f'ep_{self.epoch_num}.pth'))

        self.epoch_num += 1

def train_epoch(self):
    # 1. Collect rollout
    rollout = self.play_steps()

    # 2. Compute GAE advantages
    advantages = self.compute_gae(rollout)

    # 3. PPO mini-batch updates
    for _ in range(self.mini_epochs_num):
        for batch in self.get_mini_batches(rollout, advantages):
            self.calc_gradients(batch)

    return {'mean_rewards': ..., 'policy_loss': ..., 'value_loss': ..., 'fps': ...}

Related Pages

Principle:Isaac_sim_IsaacGymEnvs_Policy_Training_Loop

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment