Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Facebookresearch Habitat lab VER RolloutStorage

From Leeroopedia
Knowledge Sources
Domains Embodied_AI, Reinforcement_Learning, Distributed_Training
Last Updated 2026-02-15 00:00 GMT

Overview

VERRolloutStorage extends the standard RolloutStorage to support Variable Experience Rollout (VER), providing a shared-memory experience buffer with support for variable-length rollouts, importance sampling coefficients, policy versioning, and episode-aware return computation.

Description

The VERRolloutStorage class extends RolloutStorage to handle the unique requirements of VER training. Key differences from standard rollout storage include:

Variable experience mode: When enabled, the buffer is treated as a flat linear array rather than a structured (steps x envs) grid. A shared pointer (ptr) tracks the next write position, and prev_inds tracks the previous buffer index for each environment. This allows different environments to contribute different numbers of steps per rollout, which is essential for VER's throughput optimization.

Auxiliary buffers: The class maintains shared-memory auxiliary buffers (managed as numpy arrays for CPU-side performance, backed by PyTorch shared-memory tensors) for:

  • next_hidden_states / next_prev_actions: RNN state carried between rollouts
  • current_policy_version / cpu_current_policy_version: Policy version tracking for stale experience detection
  • num_steps_collected / rollout_done: Rollout completion tracking
  • current_steps / actor_steps_collected: Per-environment step counters
  • ptr / prev_inds / will_replay_step: Buffer management for VER mode

Importance sampling: When variable experience is enabled, the storage computes importance sampling (IS) coefficients to correct for the biased sampling that occurs when environments contribute different numbers of steps. The IS weight for each step is (num_steps + 1) / samples_from_that_env.

Return computation: Overrides the standard GAE return computation with an episode-aware version. It uses build_pack_info_from_episode_ids to construct sequence information from episode IDs, environment IDs, and step IDs, then computes GAE returns respecting episode boundaries and handling stale experience (steps collected under an older policy version).

Mini-batch generation: The data_generator method yields mini-batches for VER. It uses generate_ver_mini_batches which:

  1. Takes all sequences of experience and puts them in random order
  2. Slices their steps into the requested number of mini-batches
  3. Builds RNN sequence packing info per mini-batch for efficient RNN processing
  4. Selects the first hidden state for each sequence from the recurrent hidden states

After-update cleanup: The after_update method handles the complex state transition between rollouts:

  • In fixed-experience mode: copies the last step's hidden states and actions to the "next" buffers
  • In variable-experience mode: preserves data for environments with actions in flight, reorders remaining buffer entries by staleness (oldest first for overwriting), and resets pointers

The module also includes utility functions:

  • compute_movements_for_aliased_swaps -- Computes correct swap operations when source and destination indices may overlap
  • partition_n_into_p -- Partitions n elements into p bins
  • generate_ver_mini_batches -- Iterator yielding mini-batch indices for VER training

Usage

This storage is used by the VER trainer as a drop-in replacement for the standard RolloutStorage. It is shared across processes (environment workers, inference workers, and the learner) via PyTorch's shared memory mechanism. The inference workers write experience data, and the learner reads it for gradient computation.

Code Reference

Source Location

Signature

class VERRolloutStorage(RolloutStorage):
    def __init__(
        self,
        numsteps,
        num_envs,
        observation_space,
        action_space,
        actor_critic,
        variable_experience: bool,
        is_double_buffered: bool = False,
    ): ...
    def after_update(self): ...
    def increment_policy_version(self): ...
    def after_rollout(self): ...
    def compute_returns(self, use_gae, gamma, tau): ...
    def data_generator(
        self,
        advantages: Optional[torch.Tensor],
        num_mini_batch: int,
    ) -> Iterator[DictTree]: ...
    def share_memory_(self): ...
    def to(self, device): ...
    def copy(self, other: "VERRolloutStorage"): ...

def compute_movements_for_aliased_swaps(
    dst_locations: np.ndarray, src_locations: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]: ...

def generate_ver_mini_batches(
    num_mini_batch: int,
    sequence_lengths: np.ndarray,
    num_seqs_at_step: np.ndarray,
    select_inds: np.ndarray,
    last_sequence_in_batch_mask: np.ndarray,
    episode_ids: np.ndarray,
) -> Iterator[np.ndarray]: ...

def partition_n_into_p(n: int, p: int) -> List[int]: ...

Import

from habitat_baselines.rl.ver.ver_rollout_storage import (
    VERRolloutStorage,
    compute_movements_for_aliased_swaps,
    generate_ver_mini_batches,
    partition_n_into_p,
)

I/O Contract

Inputs (Constructor)

Name Type Required Description
numsteps int Yes Number of rollout steps per environment
num_envs int Yes Number of parallel environments
observation_space spaces.Dict Yes Observation space defining buffer shapes
action_space spaces.Space Yes Action space defining action buffer shape
actor_critic nn.Module Yes The actor-critic policy (used to determine hidden state dimensions)
variable_experience bool Yes Whether to use VER mode (flat buffer) or fixed mode (structured buffer)
is_double_buffered bool No Whether to use double-buffered sampling (default False)

Outputs (data_generator)

Name Type Description
batch DictTree A mini-batch dictionary containing: observations, actions, action_log_probs, value_preds, returns, masks, recurrent_hidden_states, advantages (if provided), is_coeffs (if variable experience), and rnn_build_seq_info for packed RNN processing

Usage Examples

Basic Usage

from habitat_baselines.rl.ver.ver_rollout_storage import VERRolloutStorage

# Create VER rollout storage
rollouts = VERRolloutStorage(
    numsteps=128,
    num_envs=4,
    observation_space=obs_space,
    action_space=action_space,
    actor_critic=actor_critic,
    variable_experience=True,
)

# Share memory for multiprocessing
rollouts.share_memory_()
rollouts.to(device)

# After collecting a rollout:
rollouts.after_rollout()  # Compute IS coefficients and mark stale steps
rollouts.compute_returns(use_gae=True, gamma=0.99, tau=0.95)

# Generate mini-batches for PPO update
advantages = rollouts.buffers["returns"] - rollouts.buffers["value_preds"]
for batch in rollouts.data_generator(advantages, num_mini_batch=2):
    # batch contains all data needed for a PPO update step
    observations = batch["observations"]
    actions = batch["actions"]
    returns = batch["returns"]
    rnn_info = batch["rnn_build_seq_info"]
    # ... perform PPO update ...

# After PPO update:
rollouts.increment_policy_version()
rollouts.after_update()

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment