Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haosulab ManiSkill Trajectory Replay Conversion

From Leeroopedia
Field Value
Source Repository haosulab/ManiSkill
Domains Imitation_Learning, Robotics, Data_Processing, Simulation
Last Updated 2026-02-15

Overview

Description

Trajectory Replay and Conversion is the process of re-executing recorded expert demonstrations through the ManiSkill simulator while capturing observations under a different observation mode, control mode, or simulation backend than the one used during original data collection. This is a critical data preprocessing step in the imitation learning pipeline that bridges the gap between raw demonstration data (which typically contains only environment states and actions) and the specific input representation required by a downstream learning algorithm.

The key insight enabling this approach is that ManiSkill demonstrations are stored as sequences of actions and environment states rather than observations. By replaying these actions (or restoring environment states) step-by-step through the simulator, the system can regenerate the trajectory under any supported observation mode -- including state (compact numerical vectors), rgbd (RGB + depth images from cameras), and pointcloud (3D point cloud representations) -- and any supported control mode -- including pd_joint_pos (absolute joint position), pd_joint_delta_pos (relative joint position changes), pd_ee_delta_pos (end-effector delta position), and others.

This decoupling of data collection from observation representation means that a single set of expert demonstrations can be converted into multiple training datasets without re-collecting data. For example, demonstrations collected under pd_joint_pos control can be replayed and converted to pd_joint_delta_pos actions through action space conversion algorithms. Similarly, state-only demonstrations can be replayed to generate RGBD image observations for training vision-based policies.

Usage

Trajectory replay and conversion is used after downloading demonstrations and before training an imitation learning policy. Typical use cases include:

  • Observation mode adaptation: Converting raw (state-only) demonstrations to include RGBD or pointcloud observations for vision-based policy training.
  • Control mode conversion: Translating actions from the original control mode (e.g. pd_joint_pos) to a target control mode (e.g. pd_joint_delta_pos or pd_ee_delta_pos) that is more suitable for the learning algorithm or deployment scenario.
  • Simulation backend migration: Replaying CPU-collected demonstrations in GPU-parallelized simulation for faster batch processing.
  • Video generation: Producing visualization videos of the expert trajectories for debugging and inspection.
  • Data filtering: Discarding unsuccessful or timed-out episodes during replay, producing a clean dataset of only successful demonstrations.

Theoretical Basis

Action Space Conversion is the process of mapping actions from one control mode to another. In ManiSkill, this is implemented through forward simulation: the original actions are executed in an environment configured with the original control mode, and the resulting joint states are used to compute equivalent actions in the target control mode. Currently, conversions from pd_joint_pos and pd_joint_delta_pos to other modes are supported, with Panda robot arms having the best support.

Observation Mode Adaptation leverages the determinism of the physics simulation. Given the same initial state and the same sequence of actions, the simulator produces the same sequence of states. By wrapping the environment with a different observation mode, the replay process captures the corresponding observations (images, point clouds, etc.) at each step without needing to re-collect data from scratch.

Simulation State Replay is an alternative to action replay. Instead of re-executing actions and relying on simulation determinism, the environment state is explicitly set at each timestep from the recorded state data. This guarantees visual fidelity but prevents control mode conversion (since the state trajectory is fixed regardless of the actions that produced it).

Key considerations:

  • Determinism: CPU simulation is generally deterministic for action replay. GPU simulation may introduce small non-deterministic variations (on the order of 1e-4) due to floating-point parallelism.
  • Retry mechanism: When action replay fails to reproduce a successful trajectory (due to numerical differences), the system supports multiple retries with slight variations.
  • Parallelization: Replay can be parallelized across multiple CPU processes or across GPU-parallel environments for efficiency.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment