Principle:ARISE Initiative Robosuite Demonstration Trajectory Collection

Metadata:

robosuite
DAgger
Imitation_Learning
Teleoperation
last_updated: 2026-02-15 12:00 GMT

Overview

Process for collecting a single human demonstration trajectory by running a teleoperation loop that records states and actions until task completion or user reset.

Description

Demonstration trajectory collection is the core loop for gathering human expert data. The operator controls the robot via an input device while the DataCollectionWrapper records every state and action. Each trajectory runs until: (1) the task succeeds (_check_success), (2) the episode horizon is reached, or (3) the operator triggers a reset. Frame rate limiting ensures smooth human control.

The collection process captures the full state-action sequence of expert demonstrations, which forms the foundation for imitation learning algorithms. The DataCollectionWrapper transparently intercepts all environment interactions, storing observations, actions, rewards, and metadata without requiring modifications to the control loop itself.

Usage

Use when building demonstration datasets for imitation learning (behavioral cloning, DAgger) or curriculum RL. This principle applies whenever you need to gather expert trajectories from human operators controlling robotic systems through teleoperation devices.

Typical scenarios include:

Building initial demonstration datasets for behavioral cloning
Collecting on-policy corrections for DAgger-style training
Gathering failure recovery demonstrations
Creating curriculum datasets with varying task difficulties

Theoretical Basis

Learning from Demonstrations (LfD). Human expert provides state-action pairs (s, a). The collected trajectories form a dataset D = {(s_t, a_t)} used for supervised learning or to initialize RL policies.

Pseudocode for the collection loop:

# Initialize environment and recording
env.reset()
device.start_control()
trajectory = []

# Collection loop
while not done:
    # Get human input
    state = env.get_observation()
    action = device.input2action(state)

    # Execute and record
    next_state, reward, done = env.step(action)
    trajectory.append((state, action, reward))

    # Check termination conditions
    if task_success(env) or episode_timeout or user_reset:
        done = True

    # Rate limiting for smooth control
    sleep_to_maintain_framerate(max_fr)

# Save trajectory
save_trajectory(trajectory)

The collected dataset D can then be used for:

Behavioral Cloning: Learn policy π(a|s) via supervised learning to minimize L(θ) = E[(π_θ(s) - a_expert)²]
DAgger: Iteratively collect demonstrations under the learned policy's state distribution
RL Initialization: Bootstrap policy or value networks from demonstration data

Related Pages

Implementation:ARISE_Initiative_Robosuite_Collect_Human_Trajectory

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment