Principle:ARISE Initiative Robomimic Rollout Evaluation

Knowledge Sources	robomimic robomimic robomimic Getting Started
Domains	Robotics, Evaluation, Simulation
Last Updated	2026-02-15 08:00 GMT

Overview

An environment rollout evaluation pattern that deploys trained policies in simulation environments to measure task performance metrics such as success rate, total return, and episode horizon.

Description

Rollout Evaluation is the primary method for measuring the quality of a trained robot manipulation policy. Unlike supervised learning metrics (e.g., MSE on held-out data), rollout evaluation tests the policy in closed-loop interaction with a simulation environment, which is the ground truth for task success.

During each rollout episode, the policy receives observations from the environment, computes actions, and the environment advances by one step. This continues until the maximum horizon is reached, the episode terminates, or a success condition is met. The evaluation collects per-episode statistics (Return, Horizon, Success_Rate) and averages them across multiple episodes per environment.

This principle supports:

Multi-environment evaluation: Test across different task variants simultaneously
Video recording: Record rollout videos for qualitative inspection
Goal-conditioned evaluation: Support for goal-conditioned policies
Early termination: Optionally stop episodes upon task success

Usage

Use this principle during training (periodic evaluation checkpoints) or after training (final model evaluation). It requires a trained policy wrapped as a RolloutPolicy and one or more simulation environments. In the training workflow, it is called at regular epoch intervals to track learning progress.

Theoretical Basis

Rollout evaluation implements closed-loop policy evaluation in a Markov Decision Process:

# Abstract rollout evaluation (not real implementation)
def evaluate_policy(policy, env, horizon, num_episodes):
    all_stats = []
    for episode in range(num_episodes):
        obs = env.reset()
        policy.start_episode()
        total_reward = 0
        for t in range(horizon):
            action = policy(obs)
            obs, reward, done, info = env.step(action)
            total_reward += reward
            if done or env.is_success():
                break
        all_stats.append({
            "Return": total_reward,
            "Horizon": t + 1,
            "Success_Rate": float(env.is_success())
        })
    return average(all_stats)

The key metric is Success_Rate, which measures the fraction of episodes where the robot successfully completes the manipulation task.

Related Pages

Implemented By

Implementation:ARISE_Initiative_Robomimic_TrainUtils_rollout_with_stats

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment