Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haosulab ManiSkill Episode Initialization

From Leeroopedia
Field Value
Page Type Principle
Title ManiSkill Episode Initialization
Domain Simulation, Robotics, Environment_Design, Reinforcement_Learning
Related Implementation Implementation:Haosulab_ManiSkill_Initialize_Episode_Pattern
Date 2026-02-15
Repository Haosulab/ManiSkill

Overview

Description

Episode initialization in ManiSkill is the process of randomizing object poses, goal configurations, and other task-relevant state at the start of each episode. This is performed by the _initialize_episode() method, which is called automatically by BaseEnv.reset() after the scene has been set up and simulation state has been cleared.

The initialization system is designed to support GPU-batched parallel environments. Rather than initializing one environment at a time, _initialize_episode() receives an env_idx tensor specifying which environments are being reset. This enables partial reset: when some environments in a GPU-parallel batch finish their episode while others are still running, only the finished environments are re-initialized. The env_idx tensor tells the method exactly which environments need new initial configurations.

The initialization process typically involves:

  • Scene builder initialization: Calling self.table_scene.initialize(env_idx) to set table poses and robot initial joint configurations.
  • Object pose randomization: Sampling random positions and orientations for task objects (cubes, bottles, tools, etc.) and applying them via actor.set_pose().
  • Goal configuration: Setting target positions, target orientations, or other task-specific goal parameters.
  • Collision avoidance: Using samplers like UniformPlacementSampler to ensure objects do not overlap when placed randomly.

All pose-setting operations during initialization are automatically masked to only affect the environments specified by env_idx. This means the developer can write code that generates data for a batch of size len(env_idx) and set poses without worrying about corrupting the state of environments that are not being reset.

Usage

Episode initialization is implemented by overriding _initialize_episode() in a custom task. This method is called:

  • During the first reset() call after environment creation.
  • On every subsequent reset() call (unless reset_to_env_states is provided in options).
  • With a subset of environment indices during partial reset.

The developer should:

  1. Accept env_idx: torch.Tensor and options: dict parameters.
  2. Use b = len(env_idx) as the batch size for generating random data.
  3. Use torch.device(self.device) context manager to ensure tensors are created on the correct device.
  4. Call self.table_scene.initialize(env_idx) if using a scene builder.
  5. Sample random poses/configurations and apply them to actors and articulations.

Theoretical Basis

Episode initialization in ManiSkill draws on several concepts from reinforcement learning and simulation:

  • Domain randomization: Randomizing initial object configurations across episodes is a form of domain randomization (Tobin et al., 2017) that improves the generalization of learned policies. By presenting the agent with diverse starting configurations, the policy learns to handle a wider range of scenarios rather than overfitting to a single setup.
  • Partial reset for throughput: In GPU-parallel reinforcement learning, different environments in a batch may terminate at different time steps. Partial reset avoids the need to synchronize all environments -- environments that are still running continue uninterrupted while terminated environments are re-initialized. This maintains GPU utilization and improves training throughput.
  • Reproducible RNG for simulation: ManiSkill maintains a two-level RNG hierarchy: a main RNG that generates episode seeds, and an episode RNG that generates per-episode random data. This design ensures that given the same seed sequence, the same episode configurations are produced regardless of when partial resets occur. The torch.random.fork_rng() context manager is used to isolate PyTorch random state during initialization, and BatchedRNG provides per-environment numpy random state for consistent randomization in CPU and GPU modes.
  • Curriculum learning compatibility: The options dict parameter allows external code (such as a curriculum learning controller) to pass information that influences initialization -- for example, selecting difficulty levels or specific object configurations.
  • Collision-free placement: The UniformPlacementSampler implements rejection sampling with distance constraints. For each new object, it samples candidate positions uniformly within specified bounds and rejects any sample that is too close to previously placed objects. This prevents interpenetrating objects, which can cause physics instabilities.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment