Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Facebookresearch Habitat lab Evaluation Execution

From Leeroopedia
Knowledge Sources
Domains Evaluation, Embodied_AI
Last Updated 2026-02-15 02:00 GMT

Overview

Execution of the standard evaluation loop that runs an agent through episodes and aggregates performance metrics across the evaluation set.

Description

Evaluation Execution is the core evaluation process: iterating over episodes, running the agent's reset/act loop until episode termination, collecting per-episode metrics, and computing aggregate statistics. The standard protocol evaluates all episodes in the dataset split (or a specified count) and reports mean metrics.

Usage

Call after creating a Benchmark instance and an Agent instance. This is the primary evaluation entry point for the Habitat benchmarking framework.

Theoretical Basis

The evaluation loop:

# Standard evaluation protocol
all_metrics = []
for episode in dataset:
    obs = env.reset()
    agent.reset()
    while not env.episode_over:
        action = agent.act(obs)
        obs = env.step(action)
    all_metrics.append(env.get_metrics())

# Report aggregate
aggregate = {k: mean(m[k] for m in all_metrics) for k in metric_keys}

Key metrics: Success, SPL, SoftSPL, DistanceToGoal.

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment