Implementation:Isaac sim IsaacGymEnvs AMPContinuous Train
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Motion_Imitation |
| Last Updated | 2026-02-15 11:00 GMT |
Overview
AMPAgent is the core training agent for Adversarial Motion Priors (AMP), extending CommonAgent to combine task rewards with style rewards derived from a learned discriminator that distinguishes agent behavior from reference motion capture data.
Description
The AMPAgent class in isaacgymenvs/learning/amp_continuous.py implements the full AMP training pipeline for continuous action spaces. It extends the CommonAgent (which itself wraps rl_games A2C) with a discriminator network that learns to classify motion observations as either coming from the agent's rollouts or from demonstration (motion capture) data. The discriminator's output is transformed into a style reward that encourages the agent to produce naturalistic motion.
The training loop in train_epoch() first collects rollout experience via play_steps(), which stores AMP observations alongside standard RL data. It then samples demonstration observations from a replay buffer, computes discriminator rewards via _calc_amp_rewards(), and combines them with task rewards using configurable weights (_task_reward_w and _disc_reward_w). The gradient computation in calc_gradients() jointly optimizes the policy/value network and the discriminator in a single backward pass.
The discriminator loss function _disc_loss() applies a standard binary classification objective augmented with gradient penalty regularization (to enforce the Lipschitz constraint) and logit regularization (to prevent overconfident predictions). The discriminator reward is computed as clamp(1 - 0.25 * (disc_logit - 1)^2, 0), scaled by _disc_reward_scale, ensuring a bounded and smooth reward signal for the policy.
Usage
Use AMPAgent when training humanoid or character animation policies that should exhibit natural motion styles learned from motion capture demonstrations. It is the standard agent for all AMP-based tasks in IsaacGymEnvs, including HumanoidAMP and its variants.
Code Reference
Source Location
- Repository: IsaacGymEnvs
- File: isaacgymenvs/learning/amp_continuous.py
- Lines: 1-555
Signature
class AMPAgent(common_agent.CommonAgent):
def __init__(self, base_name, params):
"""Initialize AMP agent with discriminator, replay buffers, and AMP config."""
def init_tensors(self):
"""Initialize experience tensors and build AMP observation buffers."""
def train_epoch(self):
"""Run one epoch: play_steps -> sample demos -> calc_gradients -> update."""
def play_steps(self):
"""Collect rollout data including AMP observations and compute combined rewards."""
def prepare_dataset(self, batch_dict):
"""Add amp_obs, amp_obs_demo, amp_obs_replay to the training dataset."""
def calc_gradients(self, input_dict):
"""Compute policy, value, and discriminator gradients in a single pass."""
def _disc_loss(self, disc_agent_logit, disc_demo_logit, obs_demo):
"""Discriminator loss with gradient penalty and logit regularization."""
def _calc_amp_rewards(self, amp_obs):
"""Compute style rewards from discriminator output."""
def _combine_rewards(self, task_rewards, amp_rewards):
"""Weighted sum: task_reward_w * task + disc_reward_w * style."""
def env_step(self, actions):
"""Step environment and retrieve AMP observations from info dict."""
Import
from isaacgymenvs.learning.amp_continuous import AMPAgent
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| base_name | str | Yes | Base name for the experiment and logging |
| params | dict | Yes | Configuration dictionary containing AMP-specific keys: task_reward_w, disc_reward_w, amp_batch_size, amp_minibatch_size, disc_coef, disc_logit_reg, disc_grad_penalty, disc_weight_decay, disc_reward_scale |
| env_info | dict | Yes | Environment info including amp_observation_space defining the discriminator input shape |
| config | dict | Yes | rl_games algorithm config with AMP parameters merged in |
Outputs
| Name | Type | Description |
|---|---|---|
| train_result | dict | Training metrics including actor_loss, critic_loss, entropy, kl, disc_loss, disc_agent_acc, disc_demo_acc, disc_agent_logit, disc_demo_logit |
| batch_dict | dict | Experience batch containing obs, actions, rewards, amp_obs, amp_obs_demo, amp_obs_replay, returns |
| combined_rewards | Tensor | Weighted combination of task rewards and AMP style rewards |
Usage Examples
# AMPAgent is typically instantiated by the rl_games runner via config registration.
# In the train config YAML (e.g. HumanoidAMPPPO.yaml):
# params:
# algo:
# name: amp_agent
# config:
# task_reward_w: 0.5
# disc_reward_w: 0.5
# amp_batch_size: 512
# amp_minibatch_size: 256
# disc_coef: 5.0
# disc_logit_reg: 0.05
# disc_grad_penalty: 5.0
# disc_reward_scale: 2.0
# Registration in the runner:
from isaacgymenvs.learning.amp_continuous import AMPAgent
from rl_games.common import object_factory
runner = Runner()
runner.algo_factory.register_builder('amp_agent', lambda **kwargs: AMPAgent(**kwargs))
# The agent is then created and trained via:
runner.run({'train': True})