Implementation:Haosulab ManiSkill BC Diffusion Training
| Field | Value |
|---|---|
| Source Repository | haosulab/ManiSkill |
| Type | Pattern Doc |
| Domains | Imitation_Learning, Robotics, Machine_Learning, Deep_Learning |
| Last Updated | 2026-02-15 |
Overview
Description
ManiSkill provides two reference training scripts for imitation learning: a Behavioral Cloning (BC) script and a Diffusion Policy script. Both are state-based (operating on compact numerical observation vectors) and follow a similar structure: load demonstration data, construct a policy network, train with mini-batch gradient descent, and periodically evaluate on the simulation environment.
The BC script (examples/baselines/bc/bc.py) trains a 3-layer MLP policy using MSE loss with an Adam optimizer. It uses a custom ManiSkillDataset class that loads HDF5 trajectory data and a custom IterationBasedBatchSampler for fixed-iteration training.
The Diffusion Policy script (examples/baselines/diffusion_policy/train.py) trains a ConditionalUnet1D noise prediction network using the DDPM framework from the HuggingFace diffusers library. It uses a specialized SmallDemoDataset_DiffusionPolicy class that pre-computes observation/action sequence slices according to the observation, action, and prediction horizons. Training uses AdamW with cosine LR scheduling and EMA for stable evaluation.
Usage
These scripts are used after preparing the demonstration dataset (downloading, replaying/converting, and having the .h5 files ready). They are the core training step of the imitation learning pipeline.
Code Reference
Source Location
| Script | File | Key Sections |
|---|---|---|
| Behavioral Cloning | examples/baselines/bc/bc.py |
Args (L28-86), ManiSkillDataset (L129-189), Actor network (L192-205), Training loop (L317-365) |
| Diffusion Policy | examples/baselines/diffusion_policy/train.py |
Args (L31-95), SmallDemoDataset_DiffusionPolicy (L97-166), Agent (L169-255), Training loop (L395-434) |
Signature
BC Args dataclass (key parameters):
@dataclass
class Args:
env_id: str = "PegInsertionSide-v0"
demo_path: str = "data/ms2_official_demos/rigid_body/PegInsertionSide-v0/trajectory.state.pd_ee_delta_pose.h5"
num_demos: Optional[int] = None # number of trajectories to load
total_iters: int = 1_000_000 # total training iterations
batch_size: int = 1024 # mini-batch size
lr: float = 3e-4 # learning rate (Adam)
normalize_states: bool = False # normalize observations to mean=0, std=1
control_mode: str = "pd_joint_delta_pos" # must match demo control mode
num_eval_episodes: int = 100
num_eval_envs: int = 10
eval_freq: int = 1000 # evaluate every N iterations
log_freq: int = 1000
sim_backend: str = "cpu"
Diffusion Policy Args dataclass (key parameters):
@dataclass
class Args:
env_id: str = "PegInsertionSide-v0"
demo_path: str = "demos/PegInsertionSide-v1/trajectory.state.pd_ee_delta_pose.physx_cpu.h5"
num_demos: Optional[int] = None
total_iters: int = 1_000_000
batch_size: int = 1024
lr: float = 1e-4 # learning rate (AdamW)
obs_horizon: int = 2 # observation context window
act_horizon: int = 8 # actions executed per planning step
pred_horizon: int = 16 # total actions predicted per denoising
diffusion_step_embed_dim: int = 64 # diffusion timestep embedding dim
unet_dims: List[int] = [64, 128, 256] # U-Net channel dimensions (~4.5M params)
n_groups: int = 8 # GroupNorm groups
control_mode: str = "pd_joint_delta_pos"
max_episode_steps: Optional[int] = None # required for diffusion policy
num_eval_episodes: int = 100
num_eval_envs: int = 10
eval_freq: int = 5000
sim_backend: str = "physx_cpu"
BC Actor network:
class Actor(nn.Module):
def __init__(self, state_dim: int, action_dim: int):
super(Actor, self).__init__()
self.net = nn.Sequential(
nn.Linear(state_dim, 256),
nn.ReLU(),
nn.Linear(256, 256),
nn.ReLU(),
nn.Linear(256, action_dim),
)
def forward(self, state: torch.Tensor) -> torch.Tensor:
return self.net(state)
Diffusion Policy Agent (noise prediction):
class Agent(nn.Module):
def __init__(self, env, args):
super().__init__()
self.noise_pred_net = ConditionalUnet1D(
input_dim=act_dim,
global_cond_dim=obs_horizon * obs_dim,
diffusion_step_embed_dim=args.diffusion_step_embed_dim,
down_dims=args.unet_dims,
n_groups=args.n_groups,
)
self.num_diffusion_iters = 100
self.noise_scheduler = DDPMScheduler(
num_train_timesteps=100,
beta_schedule='squaredcos_cap_v2',
clip_sample=True,
prediction_type='epsilon',
)
Import
BC external dependencies:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import tyro
from torch.utils.tensorboard import SummaryWriter
Diffusion Policy external dependencies:
import torch
import torch.nn as nn
import torch.optim as optim
import tyro
from diffusers.schedulers.scheduling_ddpm import DDPMScheduler
from diffusers.training_utils import EMAModel
from diffusers.optimization import get_scheduler
from torch.utils.tensorboard import SummaryWriter
I/O Contract
Inputs:
| Input | Type | Description |
|---|---|---|
demo_path |
str (file path) | Path to a ManiSkill .h5 trajectory file with observations and actions in the desired mode. Must have a companion .json metadata file.
|
env_id |
str | ManiSkill environment ID for creating evaluation environments. |
control_mode |
str | Control mode that must match the demonstration dataset's control mode. |
Outputs:
| Output | Type | Description |
|---|---|---|
| Checkpoint files | .pt files |
Saved model weights in runs/{run_name}/checkpoints/. BC saves actor state dict. Diffusion Policy saves agent and ema_agent state dicts.
|
| TensorBoard logs | event files | Training loss, learning rate, and evaluation metrics logged to runs/{run_name}/.
|
| Evaluation videos | MP4 files | (Optional) Videos of policy rollouts saved to runs/{run_name}/videos/.
|
Training loop structure (BC):
for iteration, batch in enumerate(dataloader):
obs, action, _ = batch
pred_action = actor(obs)
loss = F.mse_loss(pred_action, action)
optimizer.zero_grad()
loss.backward()
optimizer.step()
Training loop structure (Diffusion Policy):
for iteration, data_batch in enumerate(train_dataloader):
total_loss = agent.compute_loss(
obs_seq=data_batch["observations"],
action_seq=data_batch["actions"],
)
optimizer.zero_grad()
total_loss.backward()
optimizer.step()
lr_scheduler.step()
ema.step(agent.parameters())
Usage Examples
Example 1: Train BC on PickCube-v1
cd examples/baselines/bc
python bc.py \
--env-id PickCube-v1 \
--demo-path ~/.maniskill/demos/PickCube-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
--control-mode pd_joint_delta_pos \
--total-iters 100000 \
--batch-size 256 \
--lr 3e-4
Example 2: Train Diffusion Policy on PegInsertionSide-v1
cd examples/baselines/diffusion_policy
python train.py \
--env-id PegInsertionSide-v1 \
--demo-path demos/PegInsertionSide-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
--control-mode pd_joint_delta_pos \
--max-episode-steps 300 \
--total-iters 300000 \
--obs-horizon 2 \
--act-horizon 8 \
--pred-horizon 16
Example 3: Train BC with WandB tracking
python bc.py \
--env-id StackCube-v1 \
--demo-path trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
--control-mode pd_joint_delta_pos \
--track \
--wandb-project-name ManiSkill-IL \
--total-iters 500000
Example 4: Train Diffusion Policy with custom U-Net architecture
python train.py \
--env-id PushCube-v1 \
--demo-path demos/PushCube-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
--control-mode pd_joint_delta_pos \
--max-episode-steps 200 \
--unet-dims 128 256 512 \
--diffusion-step-embed-dim 128 \
--n-groups 8
Related Pages
- Principle:Haosulab_ManiSkill_Imitation_Policy_Training -- The principle describing behavioral cloning, diffusion policy theory, and training considerations.
- Implementation:Haosulab_ManiSkill_ManiSkillTrajectoryDataset -- The dataset class used to load training data.
- Implementation:Haosulab_ManiSkill_IL_Eval_Loop -- The evaluation loop used during and after training.
- Implementation:Haosulab_ManiSkill_Replay_Trajectory_CLI -- The preceding step: trajectory conversion.
- Environment:Haosulab_ManiSkill_GPU_CUDA_Simulation
- Heuristic:Haosulab_ManiSkill_Rendering_Memory_Optimization