Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Haosulab ManiSkill Demonstration Data Acquisition

From Leeroopedia
Field Value
Source Repository haosulab/ManiSkill
Domains Imitation_Learning, Robotics, Data_Processing
Last Updated 2026-02-15

Overview

Description

Demonstration Data Acquisition is the foundational step in the imitation learning pipeline for robot manipulation. It involves acquiring expert demonstration datasets -- pre-collected trajectories of successful task executions -- that serve as the training signal for learning manipulation policies. In the ManiSkill ecosystem, expert demonstrations are collected across a variety of rigid-body manipulation tasks (such as PickCube-v1, StackCube-v1, PegInsertionSide-v1, PushCube-v1, and others) and are hosted on HuggingFace repositories for convenient download.

Each demonstration dataset consists of paired HDF5 (.h5) and JSON (.json) files. The HDF5 file stores the raw trajectory data -- environment states, actions, and per-step metadata organized by episode -- while the JSON file contains episode-level metadata including reset kwargs, episode seeds, control modes, and environment configuration. These raw demonstrations record actions and environment states without observations, allowing downstream replay tools to regenerate observations under any desired observation mode (state, rgbd, pointcloud) without re-collecting data.

The principle of Learning from Demonstrations (LfD) underpins this step: rather than exploring the environment from scratch via trial and error as in reinforcement learning, the agent is provided with examples of expert behavior. This dramatically reduces the sample complexity for tasks where reward shaping is difficult or where the task horizon is long. The quality and diversity of acquired demonstrations directly impacts the ceiling of downstream policy performance, making dataset curation a critical concern.

Usage

Demonstration data acquisition is the first step in the imitation learning pipeline. It is used when:

  • Expert demonstrations are available for a target manipulation task and can be downloaded from the official ManiSkill demonstration repository on HuggingFace.
  • The practitioner wants to train a behavioral cloning or diffusion policy without first designing a reward function or running reinforcement learning.
  • A standardized, reproducible starting point is needed for benchmarking imitation learning algorithms across ManiSkill tasks.

After acquiring demonstrations, the typical workflow proceeds to trajectory replay and conversion (to adapt observations and control modes), then to dataset loading, training, and evaluation.

Theoretical Basis

Learning from Demonstrations (LfD) is a paradigm in which an agent learns a policy by observing expert behavior rather than by exploring the environment with a reward signal. The core assumption is that the expert demonstrations are drawn from a near-optimal policy, and the learner seeks to recover a policy that matches the expert's state-action distribution.

Key theoretical considerations for demonstration data acquisition include:

  • Dataset Curation: The quality, diversity, and coverage of the demonstration dataset determine the upper bound on policy performance. Demonstrations should cover the range of initial conditions and task variations the policy will encounter at test time.
  • State-Action Representation Independence: By storing demonstrations as raw environment states and actions (without observations), ManiSkill decouples data collection from the choice of observation representation. This allows a single dataset to be replayed into multiple observation modes, maximizing data utility.
  • Episode-Level Organization: Demonstrations are organized as discrete episodes, each with its own seed and reset kwargs. This preserves the ability to reproduce the exact initial conditions for each trajectory, which is essential for deterministic replay and debugging.
  • Distribution Shift: A well-known challenge in LfD is that the learned policy may encounter states not represented in the demonstration data (compounding errors). Acquiring diverse and high-quality demonstrations mitigates this risk.

The available demonstration environments in ManiSkill span single-arm tasks (PickCube-v1, PushCube-v1, StackCube-v1, PegInsertionSide-v1, LiftPegUpright-v1, PlugCharger-v1), locomotion tasks (AnymalC-Reach-v1), drawing tasks (DrawTriangle-v1), tool-use tasks (PullCubeTool-v1), and multi-robot tasks (TwoRobotPickCube-v1, TwoRobotStackCube-v1).

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment