Implementation:Facebookresearch Habitat lab IL Metrics
| Knowledge Sources | |
|---|---|
| Domains | Embodied_AI, Imitation_Learning |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
The IL Metrics module provides Metric, VqaMetric, and NavMetric classes for tracking, averaging, and logging training and evaluation metrics in imitation learning pipelines for Embodied Question Answering.
Description
The Metric base class maintains a list of named metrics, each tracked with three statistics: cumulative mean (index 0), exponential moving average with decay 0.95 (index 1), and the most recent value (index 2). The update method accepts a list of values corresponding to each metric name and updates all three statistics. get_stat_string produces a formatted string of metric values, and get_stats returns the current values for a specified mode. dump_log writes the full history of statistics to a JSON file if a log path is configured.
VqaMetric extends Metric with a compute_ranks method that calculates answer accuracy and ranking positions from prediction scores and ground-truth labels. NavMetric extends Metric without additional methods, serving as a type-distinguished metric tracker for navigation tasks.
Usage
Use VqaMetric during VQA training/evaluation to track loss and accuracy with ranking. Use NavMetric for navigation-specific metrics. Use the base Metric for general-purpose metric tracking.
Code Reference
Source Location
- Repository: Facebookresearch_Habitat_lab
- File: habitat-baselines/habitat_baselines/il/metrics.py
- Lines: 15-120
Signature
class Metric:
def __init__(self, info=None, metric_names=None, log_json=None):
class VqaMetric(Metric):
def __init__(self, info=None, metric_names=None, log_json=None):
def compute_ranks(
self, scores: torch.Tensor, labels: torch.Tensor
) -> Tuple[np.ndarray, np.ndarray]:
class NavMetric(Metric):
def __init__(self, info=None, metric_names=None, log_json=None):
Import
from habitat_baselines.il.metrics import Metric, VqaMetric, NavMetric
I/O Contract
Inputs (Metric.__init__)
| Name | Type | Required | Description |
|---|---|---|---|
| info | dict | No | Metadata dictionary (e.g., epoch, split) displayed in stat strings |
| metric_names | list | No | Sorted list of metric names to track |
| log_json | str | No | File path for JSON log output; if None, logging to file is disabled |
Inputs (VqaMetric.compute_ranks)
| Name | Type | Required | Description |
|---|---|---|---|
| scores | torch.Tensor | Yes | Prediction scores tensor of shape (batch, num_answers) |
| labels | torch.Tensor | Yes | Ground-truth label indices tensor of shape (batch,) |
Outputs (VqaMetric.compute_ranks)
| Name | Type | Description |
|---|---|---|
| accuracy | np.ndarray | Binary accuracy array (1 if rank == 1, else 0) |
| ranks | np.ndarray | Rank of the correct answer for each sample |
Usage Examples
Basic Usage
from habitat_baselines.il.metrics import VqaMetric
metric = VqaMetric(
info={"epoch": 1, "split": "train"},
metric_names=["loss", "accuracy"],
log_json="logs/vqa_train.json",
)
# During training loop
for batch_idx, batch in enumerate(dataloader):
loss = compute_loss(model, batch)
accuracy = compute_accuracy(model, batch)
metric.update([loss.item(), accuracy])
# Print metrics
print(metric.get_stat_string(mode=1)) # EMA values
# Save log to JSON
metric.dump_log()
Computing Ranks
import torch
from habitat_baselines.il.metrics import VqaMetric
vqa_metric = VqaMetric(
info={"split": "val"},
metric_names=["loss", "accuracy", "mean_rank"],
)
scores = model(batch) # (batch_size, num_answers)
labels = batch["answer"] # (batch_size,)
accuracy, ranks = vqa_metric.compute_ranks(scores, labels)
vqa_metric.update([
loss.item(),
accuracy.mean(),
ranks.mean(),
])