Principle:Facebookresearch Habitat lab Evaluation Execution
| Knowledge Sources | |
|---|---|
| Domains | Evaluation, Embodied_AI |
| Last Updated | 2026-02-15 02:00 GMT |
Overview
Execution of the standard evaluation loop that runs an agent through episodes and aggregates performance metrics across the evaluation set.
Description
Evaluation Execution is the core evaluation process: iterating over episodes, running the agent's reset/act loop until episode termination, collecting per-episode metrics, and computing aggregate statistics. The standard protocol evaluates all episodes in the dataset split (or a specified count) and reports mean metrics.
Usage
Call after creating a Benchmark instance and an Agent instance. This is the primary evaluation entry point for the Habitat benchmarking framework.
Theoretical Basis
The evaluation loop:
# Standard evaluation protocol
all_metrics = []
for episode in dataset:
obs = env.reset()
agent.reset()
while not env.episode_over:
action = agent.act(obs)
obs = env.step(action)
all_metrics.append(env.get_metrics())
# Report aggregate
aggregate = {k: mean(m[k] for m in all_metrics) for k in metric_keys}
Key metrics: Success, SPL, SoftSPL, DistanceToGoal.