Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Isaac sim IsaacGymEnvs ReplayBuffer

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Data_Management
Last Updated 2026-02-15 11:00 GMT

Overview

ReplayBuffer is a GPU-resident circular replay buffer designed for efficient storage and sampling of observation data during AMP (Adversarial Motion Priors) training.

Description

The ReplayBuffer class provides a fixed-size circular buffer that stores tensors directly on the GPU device, avoiding costly CPU-GPU data transfers during training. It is primarily used in AMP training to maintain a history of agent-generated AMP observations, which are later replayed through the discriminator alongside current observations and demonstration data.

The buffer uses a head pointer (_head) that advances as new data is stored, wrapping around when it reaches the end of the buffer. The store(data_dict) method accepts a dictionary of tensors and writes them into the buffer, handling the wrap-around case where data spans the end and beginning of the circular buffer. The buffer lazily initializes its internal storage tensors on the first store() call, matching the shape and device of the incoming data.

Sampling is performed via sample(n), which uses a pre-shuffled index permutation (_sample_idx) to draw n samples without replacement within each full pass through the buffer. When all indices have been exhausted, the permutation is re-shuffled. If the buffer is not yet full (i.e., _total_count < _buffer_size), sampling indices are clamped to the valid range. The reset() method clears the buffer state and re-shuffles the sampling permutation.

Usage

Use ReplayBuffer when implementing AMP or similar algorithms that require replaying past agent experiences through a discriminator or other evaluation network. It is instantiated by the AMP agent during initialization with a specified buffer size and GPU device, and data is stored at each training step.

Code Reference

Source Location

Signature

class ReplayBuffer:
    def __init__(self, buffer_size, device):
    def reset(self):
    def get_buffer_size(self):
    def get_total_count(self):
    def store(self, data_dict):
    def sample(self, n):
    def _reset_sample_idx(self):
    def _init_data_buf(self, data_dict):

Import

from isaacgymenvs.learning.replay_buffer import ReplayBuffer

I/O Contract

Inputs

Name Type Required Description
buffer_size int Yes Maximum number of entries the buffer can hold
device torch.device Yes The GPU device on which buffer tensors are allocated
data_dict dict[str, torch.Tensor] Yes Dictionary of named tensors to store (passed to store()); all tensors must have the same batch dimension
n int Yes Number of samples to draw (passed to sample())

Outputs

Name Type Description
samples dict[str, torch.Tensor] Dictionary of sampled tensors, matching the keys of the stored data, returned by sample()
buffer_size int The maximum capacity of the buffer, returned by get_buffer_size()
total_count int The total number of entries that have been stored (may exceed buffer_size due to overwrites), returned by get_total_count()

Usage Examples

import torch
from isaacgymenvs.learning.replay_buffer import ReplayBuffer

# Create a replay buffer on GPU with capacity for 100,000 entries
device = torch.device('cuda:0')
buffer = ReplayBuffer(buffer_size=100000, device=device)

# Store AMP observations collected during rollout
amp_obs = torch.randn(256, 64, device=device)  # batch of 256, obs dim 64
buffer.store({'amp_obs': amp_obs})

# Sample a mini-batch of 128 observations for discriminator training
samples = buffer.sample(128)
amp_obs_replay = samples['amp_obs']  # shape: (128, 64)

# Check buffer state
print(f"Buffer size: {buffer.get_buffer_size()}")
print(f"Total stored: {buffer.get_total_count()}")

# Reset the buffer (e.g., at the start of a new training run)
buffer.reset()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment