Principle:Facebookresearch Habitat lab Checkpointing and Evaluation
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Evaluation |
| Last Updated | 2026-02-15 02:00 GMT |
Overview
Systematic evaluation of trained navigation agents across held-out episodes, computing standard embodied AI metrics such as Success, SPL, and Distance-to-Goal.
Description
Checkpointing and Evaluation is the process of saving trained policy checkpoints during training and later evaluating them on a fixed set of episodes. Evaluation runs the agent in inference mode (no gradient computation) across episodes, collecting per-episode metrics and aggregating them into summary statistics.
Key evaluation metrics in embodied navigation:
- Success: Binary indicator of whether the agent reached within a threshold distance of the goal
- SPL (Success weighted by Path Length): Success normalized by the ratio of shortest path to actual path length
- Distance to Goal: Euclidean distance from agent to goal at episode termination
- Soft SPL: Continuous relaxation of SPL using progress toward the goal
Usage
Use this after training is complete (or at regular intervals during training) to measure agent performance. Standard practice evaluates on all episodes in the validation or test split.
Theoretical Basis
SPL metric definition (Anderson et al., 2018):
Where is the binary success indicator, is the shortest path length, and is the agent's actual path length.
Evaluation loop pseudo-code:
# Abstract evaluation process
metrics = []
for episode in evaluation_episodes:
observation = env.reset()
agent.reset()
while not done:
action = agent.act(observation)
observation, reward, done, info = env.step(action)
metrics.append(info["metrics"])
aggregated = {k: mean(m[k] for m in metrics) for k in metric_keys}