Principle:ARISE Initiative Robomimic Results Collection

Knowledge Sources	robomimic robomimic
Domains	Robotics, Evaluation, Data_Processing
Last Updated	2026-02-15 08:00 GMT

Overview

A data aggregation pattern that transposes per-episode evaluation statistics from a list of per-episode dictionaries into a dictionary of per-metric lists for efficient statistical analysis.

Description

Results Collection addresses the need to aggregate evaluation data across multiple rollout episodes. Each rollout produces a flat dictionary of statistics (Return, Horizon, Success_Rate). After running many episodes, these individual dictionaries need to be transposed into a structure where each metric maps to a list of values across episodes, enabling vectorized statistical operations (mean, std, percentiles).

This transpose operation is fundamental in robomimic because evaluation always involves multiple episodes (typically 50+), and summary statistics are needed for logging, comparison, and paper reporting.

Usage

Use this principle after running multiple evaluation rollouts to aggregate per-episode results. It is used in both training-time evaluation (for logging) and post-training evaluation (for final reporting and HDF5 export).

Theoretical Basis

# Abstract data aggregation pattern (not real implementation)
# Input: list of per-episode dicts
episode_stats = [
    {"Return": 1.5, "Horizon": 200, "Success_Rate": 1.0},
    {"Return": 0.3, "Horizon": 400, "Success_Rate": 0.0},
    {"Return": 1.2, "Horizon": 150, "Success_Rate": 1.0},
]

# Output: dict of per-metric lists
aggregated = {
    "Return": [1.5, 0.3, 1.2],
    "Horizon": [200, 400, 150],
    "Success_Rate": [1.0, 0.0, 1.0],
}

# Now easy to compute statistics
mean_success = np.mean(aggregated["Success_Rate"])  # 0.667

Related Pages

Implemented By

Implementation:ARISE_Initiative_Robomimic_TensorUtils_list_of_flat_dict_to_dict_of_list

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment