Principle:ARISE Initiative Robomimic Results Collection
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Evaluation, Data_Processing |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
A data aggregation pattern that transposes per-episode evaluation statistics from a list of per-episode dictionaries into a dictionary of per-metric lists for efficient statistical analysis.
Description
Results Collection addresses the need to aggregate evaluation data across multiple rollout episodes. Each rollout produces a flat dictionary of statistics (Return, Horizon, Success_Rate). After running many episodes, these individual dictionaries need to be transposed into a structure where each metric maps to a list of values across episodes, enabling vectorized statistical operations (mean, std, percentiles).
This transpose operation is fundamental in robomimic because evaluation always involves multiple episodes (typically 50+), and summary statistics are needed for logging, comparison, and paper reporting.
Usage
Use this principle after running multiple evaluation rollouts to aggregate per-episode results. It is used in both training-time evaluation (for logging) and post-training evaluation (for final reporting and HDF5 export).
Theoretical Basis
# Abstract data aggregation pattern (not real implementation)
# Input: list of per-episode dicts
episode_stats = [
{"Return": 1.5, "Horizon": 200, "Success_Rate": 1.0},
{"Return": 0.3, "Horizon": 400, "Success_Rate": 0.0},
{"Return": 1.2, "Horizon": 150, "Success_Rate": 1.0},
]
# Output: dict of per-metric lists
aggregated = {
"Return": [1.5, 0.3, 1.2],
"Horizon": [200, 400, 150],
"Success_Rate": [1.0, 0.0, 1.0],
}
# Now easy to compute statistics
mean_success = np.mean(aggregated["Success_Rate"]) # 0.667