Principle:ARISE Initiative Robosuite Demonstration Trajectory Collection
Metadata:
- robosuite
- DAgger
- Imitation_Learning
- Teleoperation
- last_updated: 2026-02-15 12:00 GMT
Overview
Process for collecting a single human demonstration trajectory by running a teleoperation loop that records states and actions until task completion or user reset.
Description
Demonstration trajectory collection is the core loop for gathering human expert data. The operator controls the robot via an input device while the DataCollectionWrapper records every state and action. Each trajectory runs until: (1) the task succeeds (_check_success), (2) the episode horizon is reached, or (3) the operator triggers a reset. Frame rate limiting ensures smooth human control.
The collection process captures the full state-action sequence of expert demonstrations, which forms the foundation for imitation learning algorithms. The DataCollectionWrapper transparently intercepts all environment interactions, storing observations, actions, rewards, and metadata without requiring modifications to the control loop itself.
Usage
Use when building demonstration datasets for imitation learning (behavioral cloning, DAgger) or curriculum RL. This principle applies whenever you need to gather expert trajectories from human operators controlling robotic systems through teleoperation devices.
Typical scenarios include:
- Building initial demonstration datasets for behavioral cloning
- Collecting on-policy corrections for DAgger-style training
- Gathering failure recovery demonstrations
- Creating curriculum datasets with varying task difficulties
Theoretical Basis
Learning from Demonstrations (LfD). Human expert provides state-action pairs (s, a). The collected trajectories form a dataset D = {(s_t, a_t)} used for supervised learning or to initialize RL policies.
Pseudocode for the collection loop:
# Initialize environment and recording
env.reset()
device.start_control()
trajectory = []
# Collection loop
while not done:
# Get human input
state = env.get_observation()
action = device.input2action(state)
# Execute and record
next_state, reward, done = env.step(action)
trajectory.append((state, action, reward))
# Check termination conditions
if task_success(env) or episode_timeout or user_reset:
done = True
# Rate limiting for smooth control
sleep_to_maintain_framerate(max_fr)
# Save trajectory
save_trajectory(trajectory)
The collected dataset D can then be used for:
- Behavioral Cloning: Learn policy π(a|s) via supervised learning to minimize L(θ) = E[(π_θ(s) - a_expert)²]
- DAgger: Iteratively collect demonstrations under the learned policy's state distribution
- RL Initialization: Bootstrap policy or value networks from demonstration data