Principle:Facebookresearch Habitat lab Evaluation Execution

Knowledge Sources	On Evaluation of Embodied Navigation Agents Habitat-Lab
Domains	Evaluation, Embodied_AI
Last Updated	2026-02-15 02:00 GMT

Overview

Execution of the standard evaluation loop that runs an agent through episodes and aggregates performance metrics across the evaluation set.

Description

Evaluation Execution is the core evaluation process: iterating over episodes, running the agent's reset/act loop until episode termination, collecting per-episode metrics, and computing aggregate statistics. The standard protocol evaluates all episodes in the dataset split (or a specified count) and reports mean metrics.

Usage

Call after creating a Benchmark instance and an Agent instance. This is the primary evaluation entry point for the Habitat benchmarking framework.

Theoretical Basis

The evaluation loop:

# Standard evaluation protocol
all_metrics = []
for episode in dataset:
    obs = env.reset()
    agent.reset()
    while not env.episode_over:
        action = agent.act(obs)
        obs = env.step(action)
    all_metrics.append(env.get_metrics())

# Report aggregate
aggregate = {k: mean(m[k] for m in all_metrics) for k in metric_keys}

Key metrics: Success, SPL, SoftSPL, DistanceToGoal.

Related Pages

Implemented By

Implementation:Facebookresearch_Habitat_lab_Benchmark_evaluate

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment