Implementation:Facebookresearch Habitat lab HabitatEvaluator evaluate agent
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Evaluation |
| Last Updated | 2026-02-15 02:00 GMT |
Overview
Concrete evaluation loop for RL agents in Habitat environments, computing navigation metrics across episodes with optional video recording, provided by habitat-baselines.
Description
The HabitatEvaluator.evaluate_agent method runs a trained agent through evaluation episodes in vectorized environments. It handles batched inference with recurrent hidden states, collects per-episode metrics (success, SPL, distance_to_goal, etc.), supports video recording for visualization, and aggregates statistics across all episodes.
Usage
Called by `PPOTrainer._eval_checkpoint` after loading a trained checkpoint. Used for final evaluation on validation/test splits and for mid-training evaluation checkpoints.
Code Reference
Source Location
- Repository: habitat-lab
- File: habitat-baselines/habitat_baselines/rl/ppo/habitat_evaluator.py
- Lines: L39-340
Signature
class HabitatEvaluator(Evaluator):
def evaluate_agent(
self,
agent,
envs,
config,
checkpoint_index,
step_id,
writer,
device,
obs_transforms,
env_spec,
rank0_keys,
):
"""
Evaluate agent across episodes.
Args:
agent: Trained policy agent
envs: Vectorized evaluation environments
config: Evaluation config
checkpoint_index: Index of checkpoint being evaluated
step_id: Training step for logging
writer: TensorBoard/WandB writer
device: torch device
obs_transforms: Observation transforms to apply
env_spec: Environment specification
rank0_keys: Keys to log only on rank 0
"""
Import
from habitat_baselines.rl.ppo.habitat_evaluator import HabitatEvaluator
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| agent | PPO | Yes | Trained policy agent with `actor_critic` attribute |
| envs | VectorEnv | Yes | Vectorized evaluation environments |
| config | DictConfig | Yes | Evaluation configuration |
| checkpoint_index | int | Yes | Index of checkpoint being evaluated |
| device | torch.device | Yes | Device for inference |
Outputs
| Name | Type | Description |
|---|---|---|
| Metrics | Dict[str, float] | Aggregated metrics: distance_to_goal, success, spl, soft_spl |
| Videos | .mp4 files | Optional evaluation videos saved to video_dir |
Usage Examples
Evaluate a Checkpoint
from habitat_baselines.rl.ppo.habitat_evaluator import HabitatEvaluator
evaluator = HabitatEvaluator()
evaluator.evaluate_agent(
agent=trained_agent,
envs=eval_envs,
config=eval_config,
checkpoint_index=0,
step_id=1000000,
writer=tb_writer,
device=torch.device("cuda"),
obs_transforms=obs_transforms,
env_spec=env_spec,
rank0_keys=set(),
)