Principle:Haosulab ManiSkill Trajectory Recording
| Field | Value |
|---|---|
| Principle Name | Trajectory Recording |
| Domain | Motion_Planning |
| Overview | Recording environment trajectories and videos during task execution |
| Date | 2026-02-15 |
| Repository | Haosulab/ManiSkill |
Overview
The Trajectory Recording principle describes how ManiSkill captures the complete history of environment interactions (observations, actions, states, rewards, and termination signals) into a persistent, replayable format. This capability is essential for generating demonstration datasets that can be used for imitation learning, offline reinforcement learning, and evaluation benchmarking.
Description
Trajectory recording is implemented as a Gymnasium wrapper that intercepts calls to reset() and step(), accumulating data into an in-memory buffer and periodically flushing it to disk. The key design decisions are:
- Dual output format: Each recording session produces an HDF5 file (
.h5) for array data and a companion JSON file (.json) for metadata. The HDF5 file stores per-episode groups (e.g.,traj_0,traj_1) containing actions, observations, environment states, rewards, and termination flags. The JSON file stores environment configuration, episode metadata (seed, control mode, elapsed steps, success/fail status), and provenance information.
- Wrapper pattern: By using the Gymnasium
gym.Wrapperinterface, trajectory recording is transparent to the underlying environment and any code driving it. The wrapper can be applied to any ManiSkill environment regardless of whether it is driven by a motion planning solver, an RL policy, or a human teleoperator.
- Flush control: Trajectories can be flushed (saved to disk) automatically on
reset()or manually viaflush_trajectory(). This allows callers to selectively discard failed trajectories (e.g., when only counting successes) by callingflush_trajectory(save=False).
- Video capture: Optionally, the wrapper captures rendered frames and assembles them into MP4 video files, supporting configurable FPS, info-on-video overlays, and substep rendering for smoother videos.
- GPU parallel support: When the environment runs on GPU with multiple parallel sub-environments, the wrapper manages per-environment episode pointers and supports partial resets.
- Provenance tracking: The
source_typeandsource_descfields record how demonstrations were generated (e.g., "motionplanning", "rl", "human"), enabling downstream consumers to filter or weight data by source.
Usage
The RecordEpisode wrapper is typically applied immediately after environment creation and before any other wrappers that might transform observations:
import gymnasium as gym
from mani_skill.utils.wrappers.record import RecordEpisode
env = gym.make("PickCube-v1", obs_mode="none", control_mode="pd_joint_pos")
env = RecordEpisode(
env,
output_dir="demos/PickCube-v1/motionplanning",
trajectory_name="demo_batch",
save_video=True,
source_type="motionplanning",
source_desc="official motion planning solution",
video_fps=30,
)
After running episodes through the environment, the output directory will contain:
demo_batch.h5-- the HDF5 trajectory filedemo_batch.json-- the metadata JSON file0.mp4,1.mp4, ... -- optional video files
Theoretical Basis
- Experience replay: Storing complete interaction histories is foundational to offline RL (Levine et al., 2020) and behavioral cloning. The trajectory format must preserve enough information to reconstruct the environment state at any timestep.
- Reproducibility: By recording environment seeds, keyword arguments, and state dictionaries, the recording enables exact trajectory replay and state restoration, which is critical for scientific reproducibility.
- Data format considerations: HDF5 is chosen for its ability to store heterogeneous, hierarchical data with optional compression (gzip for image data) and random access to individual episodes. JSON is used for human-readable metadata that does not require array storage.
- Decorator/Wrapper pattern: The Gymnasium wrapper pattern allows recording to be composed with any environment without modifying the environment code, following the single-responsibility and open-closed principles.