Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Isaac sim IsaacGymEnvs AMPContinuous Train

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Motion_Imitation
Last Updated 2026-02-15 11:00 GMT

Overview

AMPAgent is the core training agent for Adversarial Motion Priors (AMP), extending CommonAgent to combine task rewards with style rewards derived from a learned discriminator that distinguishes agent behavior from reference motion capture data.

Description

The AMPAgent class in isaacgymenvs/learning/amp_continuous.py implements the full AMP training pipeline for continuous action spaces. It extends the CommonAgent (which itself wraps rl_games A2C) with a discriminator network that learns to classify motion observations as either coming from the agent's rollouts or from demonstration (motion capture) data. The discriminator's output is transformed into a style reward that encourages the agent to produce naturalistic motion.

The training loop in train_epoch() first collects rollout experience via play_steps(), which stores AMP observations alongside standard RL data. It then samples demonstration observations from a replay buffer, computes discriminator rewards via _calc_amp_rewards(), and combines them with task rewards using configurable weights (_task_reward_w and _disc_reward_w). The gradient computation in calc_gradients() jointly optimizes the policy/value network and the discriminator in a single backward pass.

The discriminator loss function _disc_loss() applies a standard binary classification objective augmented with gradient penalty regularization (to enforce the Lipschitz constraint) and logit regularization (to prevent overconfident predictions). The discriminator reward is computed as clamp(1 - 0.25 * (disc_logit - 1)^2, 0), scaled by _disc_reward_scale, ensuring a bounded and smooth reward signal for the policy.

Usage

Use AMPAgent when training humanoid or character animation policies that should exhibit natural motion styles learned from motion capture demonstrations. It is the standard agent for all AMP-based tasks in IsaacGymEnvs, including HumanoidAMP and its variants.

Code Reference

Source Location

Signature

class AMPAgent(common_agent.CommonAgent):
    def __init__(self, base_name, params):
        """Initialize AMP agent with discriminator, replay buffers, and AMP config."""

    def init_tensors(self):
        """Initialize experience tensors and build AMP observation buffers."""

    def train_epoch(self):
        """Run one epoch: play_steps -> sample demos -> calc_gradients -> update."""

    def play_steps(self):
        """Collect rollout data including AMP observations and compute combined rewards."""

    def prepare_dataset(self, batch_dict):
        """Add amp_obs, amp_obs_demo, amp_obs_replay to the training dataset."""

    def calc_gradients(self, input_dict):
        """Compute policy, value, and discriminator gradients in a single pass."""

    def _disc_loss(self, disc_agent_logit, disc_demo_logit, obs_demo):
        """Discriminator loss with gradient penalty and logit regularization."""

    def _calc_amp_rewards(self, amp_obs):
        """Compute style rewards from discriminator output."""

    def _combine_rewards(self, task_rewards, amp_rewards):
        """Weighted sum: task_reward_w * task + disc_reward_w * style."""

    def env_step(self, actions):
        """Step environment and retrieve AMP observations from info dict."""

Import

from isaacgymenvs.learning.amp_continuous import AMPAgent

I/O Contract

Inputs

Name Type Required Description
base_name str Yes Base name for the experiment and logging
params dict Yes Configuration dictionary containing AMP-specific keys: task_reward_w, disc_reward_w, amp_batch_size, amp_minibatch_size, disc_coef, disc_logit_reg, disc_grad_penalty, disc_weight_decay, disc_reward_scale
env_info dict Yes Environment info including amp_observation_space defining the discriminator input shape
config dict Yes rl_games algorithm config with AMP parameters merged in

Outputs

Name Type Description
train_result dict Training metrics including actor_loss, critic_loss, entropy, kl, disc_loss, disc_agent_acc, disc_demo_acc, disc_agent_logit, disc_demo_logit
batch_dict dict Experience batch containing obs, actions, rewards, amp_obs, amp_obs_demo, amp_obs_replay, returns
combined_rewards Tensor Weighted combination of task rewards and AMP style rewards

Usage Examples

# AMPAgent is typically instantiated by the rl_games runner via config registration.
# In the train config YAML (e.g. HumanoidAMPPPO.yaml):

# params:
#   algo:
#     name: amp_agent
#   config:
#     task_reward_w: 0.5
#     disc_reward_w: 0.5
#     amp_batch_size: 512
#     amp_minibatch_size: 256
#     disc_coef: 5.0
#     disc_logit_reg: 0.05
#     disc_grad_penalty: 5.0
#     disc_reward_scale: 2.0

# Registration in the runner:
from isaacgymenvs.learning.amp_continuous import AMPAgent
from rl_games.common import object_factory

runner = Runner()
runner.algo_factory.register_builder('amp_agent', lambda **kwargs: AMPAgent(**kwargs))

# The agent is then created and trained via:
runner.run({'train': True})

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment