Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Haosulab ManiSkill BC Diffusion Training

From Leeroopedia
Field Value
Source Repository haosulab/ManiSkill
Type Pattern Doc
Domains Imitation_Learning, Robotics, Machine_Learning, Deep_Learning
Last Updated 2026-02-15

Overview

Description

ManiSkill provides two reference training scripts for imitation learning: a Behavioral Cloning (BC) script and a Diffusion Policy script. Both are state-based (operating on compact numerical observation vectors) and follow a similar structure: load demonstration data, construct a policy network, train with mini-batch gradient descent, and periodically evaluate on the simulation environment.

The BC script (examples/baselines/bc/bc.py) trains a 3-layer MLP policy using MSE loss with an Adam optimizer. It uses a custom ManiSkillDataset class that loads HDF5 trajectory data and a custom IterationBasedBatchSampler for fixed-iteration training.

The Diffusion Policy script (examples/baselines/diffusion_policy/train.py) trains a ConditionalUnet1D noise prediction network using the DDPM framework from the HuggingFace diffusers library. It uses a specialized SmallDemoDataset_DiffusionPolicy class that pre-computes observation/action sequence slices according to the observation, action, and prediction horizons. Training uses AdamW with cosine LR scheduling and EMA for stable evaluation.

Usage

These scripts are used after preparing the demonstration dataset (downloading, replaying/converting, and having the .h5 files ready). They are the core training step of the imitation learning pipeline.

Code Reference

Source Location

Script File Key Sections
Behavioral Cloning examples/baselines/bc/bc.py Args (L28-86), ManiSkillDataset (L129-189), Actor network (L192-205), Training loop (L317-365)
Diffusion Policy examples/baselines/diffusion_policy/train.py Args (L31-95), SmallDemoDataset_DiffusionPolicy (L97-166), Agent (L169-255), Training loop (L395-434)

Signature

BC Args dataclass (key parameters):

@dataclass
class Args:
    env_id: str = "PegInsertionSide-v0"
    demo_path: str = "data/ms2_official_demos/rigid_body/PegInsertionSide-v0/trajectory.state.pd_ee_delta_pose.h5"
    num_demos: Optional[int] = None        # number of trajectories to load
    total_iters: int = 1_000_000           # total training iterations
    batch_size: int = 1024                 # mini-batch size
    lr: float = 3e-4                       # learning rate (Adam)
    normalize_states: bool = False         # normalize observations to mean=0, std=1
    control_mode: str = "pd_joint_delta_pos"  # must match demo control mode
    num_eval_episodes: int = 100
    num_eval_envs: int = 10
    eval_freq: int = 1000                  # evaluate every N iterations
    log_freq: int = 1000
    sim_backend: str = "cpu"

Diffusion Policy Args dataclass (key parameters):

@dataclass
class Args:
    env_id: str = "PegInsertionSide-v0"
    demo_path: str = "demos/PegInsertionSide-v1/trajectory.state.pd_ee_delta_pose.physx_cpu.h5"
    num_demos: Optional[int] = None
    total_iters: int = 1_000_000
    batch_size: int = 1024
    lr: float = 1e-4                       # learning rate (AdamW)
    obs_horizon: int = 2                   # observation context window
    act_horizon: int = 8                   # actions executed per planning step
    pred_horizon: int = 16                 # total actions predicted per denoising
    diffusion_step_embed_dim: int = 64     # diffusion timestep embedding dim
    unet_dims: List[int] = [64, 128, 256]  # U-Net channel dimensions (~4.5M params)
    n_groups: int = 8                      # GroupNorm groups
    control_mode: str = "pd_joint_delta_pos"
    max_episode_steps: Optional[int] = None  # required for diffusion policy
    num_eval_episodes: int = 100
    num_eval_envs: int = 10
    eval_freq: int = 5000
    sim_backend: str = "physx_cpu"

BC Actor network:

class Actor(nn.Module):
    def __init__(self, state_dim: int, action_dim: int):
        super(Actor, self).__init__()
        self.net = nn.Sequential(
            nn.Linear(state_dim, 256),
            nn.ReLU(),
            nn.Linear(256, 256),
            nn.ReLU(),
            nn.Linear(256, action_dim),
        )

    def forward(self, state: torch.Tensor) -> torch.Tensor:
        return self.net(state)

Diffusion Policy Agent (noise prediction):

class Agent(nn.Module):
    def __init__(self, env, args):
        super().__init__()
        self.noise_pred_net = ConditionalUnet1D(
            input_dim=act_dim,
            global_cond_dim=obs_horizon * obs_dim,
            diffusion_step_embed_dim=args.diffusion_step_embed_dim,
            down_dims=args.unet_dims,
            n_groups=args.n_groups,
        )
        self.num_diffusion_iters = 100
        self.noise_scheduler = DDPMScheduler(
            num_train_timesteps=100,
            beta_schedule='squaredcos_cap_v2',
            clip_sample=True,
            prediction_type='epsilon',
        )

Import

BC external dependencies:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import tyro
from torch.utils.tensorboard import SummaryWriter

Diffusion Policy external dependencies:

import torch
import torch.nn as nn
import torch.optim as optim
import tyro
from diffusers.schedulers.scheduling_ddpm import DDPMScheduler
from diffusers.training_utils import EMAModel
from diffusers.optimization import get_scheduler
from torch.utils.tensorboard import SummaryWriter

I/O Contract

Inputs:

Input Type Description
demo_path str (file path) Path to a ManiSkill .h5 trajectory file with observations and actions in the desired mode. Must have a companion .json metadata file.
env_id str ManiSkill environment ID for creating evaluation environments.
control_mode str Control mode that must match the demonstration dataset's control mode.

Outputs:

Output Type Description
Checkpoint files .pt files Saved model weights in runs/{run_name}/checkpoints/. BC saves actor state dict. Diffusion Policy saves agent and ema_agent state dicts.
TensorBoard logs event files Training loss, learning rate, and evaluation metrics logged to runs/{run_name}/.
Evaluation videos MP4 files (Optional) Videos of policy rollouts saved to runs/{run_name}/videos/.

Training loop structure (BC):

for iteration, batch in enumerate(dataloader):
    obs, action, _ = batch
    pred_action = actor(obs)
    loss = F.mse_loss(pred_action, action)
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Training loop structure (Diffusion Policy):

for iteration, data_batch in enumerate(train_dataloader):
    total_loss = agent.compute_loss(
        obs_seq=data_batch["observations"],
        action_seq=data_batch["actions"],
    )
    optimizer.zero_grad()
    total_loss.backward()
    optimizer.step()
    lr_scheduler.step()
    ema.step(agent.parameters())

Usage Examples

Example 1: Train BC on PickCube-v1

cd examples/baselines/bc
python bc.py \
    --env-id PickCube-v1 \
    --demo-path ~/.maniskill/demos/PickCube-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
    --control-mode pd_joint_delta_pos \
    --total-iters 100000 \
    --batch-size 256 \
    --lr 3e-4

Example 2: Train Diffusion Policy on PegInsertionSide-v1

cd examples/baselines/diffusion_policy
python train.py \
    --env-id PegInsertionSide-v1 \
    --demo-path demos/PegInsertionSide-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
    --control-mode pd_joint_delta_pos \
    --max-episode-steps 300 \
    --total-iters 300000 \
    --obs-horizon 2 \
    --act-horizon 8 \
    --pred-horizon 16

Example 3: Train BC with WandB tracking

python bc.py \
    --env-id StackCube-v1 \
    --demo-path trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
    --control-mode pd_joint_delta_pos \
    --track \
    --wandb-project-name ManiSkill-IL \
    --total-iters 500000

Example 4: Train Diffusion Policy with custom U-Net architecture

python train.py \
    --env-id PushCube-v1 \
    --demo-path demos/PushCube-v1/trajectory.state.pd_joint_delta_pos.physx_cpu.h5 \
    --control-mode pd_joint_delta_pos \
    --max-episode-steps 200 \
    --unet-dims 128 256 512 \
    --diffusion-step-embed-dim 128 \
    --n-groups 8

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment