Implementation:Facebookresearch Habitat lab NavDataset
| Knowledge Sources | |
|---|---|
| Domains | Embodied_AI, Imitation_Learning, Navigation_Tasks |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
NavDataset is a PyTorch dataset extending webdataset.Dataset that provides navigation data for the PACMAN-based Embodied Question Answering (EQA) imitation learning pipeline, handling frame caching, hierarchical action decomposition, and CNN feature extraction.
Description
NavDataset manages the full data pipeline for training a hierarchical navigation model that decomposes flat action sequences into planner and controller actions. Key responsibilities include:
Data Loading and Caching:
- On first use, iterates through all scenes and episodes, rendering RGB frames at each position along the shortest path and saving them to disk as JPEG images.
- Compresses frame directories into tar archives using create_tar_archive() for efficient streaming via WebDataset.
- Checks for existing caches with cache_exists() to avoid redundant rendering.
Hierarchical Action Decomposition:
- flat_to_hierarchical_actions(): Converts a flat sequence of navigation actions (FWD, LEFT, RIGHT, STOP) into a two-level hierarchy:
- Planner actions: High-level direction changes (when the action type changes).
- Controller actions: Low-level continuation signals (1 = continue same action, 0 = action changed).
- Actions are preprocessed with an offset (0=NULL, 1=START, 2=FWD, 3=LEFT, 4=RIGHT, 5=STOP) and padded to max_action_len.
Feature Extraction:
- Uses a pretrained MultitaskCNN (loaded from a checkpoint) to extract image features from RGB frames.
- get_img_features(): Preprocesses images (transpose, normalize to [0,1]) and runs them through the CNN encoder.
WebDataset Integration:
- group_by_keys_(): Custom WebDataset pipeline function that groups tar archive entries by episode, attaching question tokens and answer indices from the EQA dataset.
- map_dataset_sample(): Transforms raw WebDataset samples into training-ready tensors including planner/controller image features, action sequences, hidden index mappings, and masks.
Vocabulary Management:
- restructure_ans_vocab(): Remaps answer vocabulary to consecutive integer indices.
- get_vocab_dicts(): Returns question and answer VocabDict instances.
The class operates in two modes: train (returns full hierarchical features) and val (returns raw actions, action length, and goal position for evaluation with backtracking).
Usage
NavDataset is used within the EQA imitation learning pipeline. It is instantiated with a Habitat environment and configuration, automatically caching frames on first run. Subsequent runs load from the tar archive for fast streaming.
Code Reference
Source Location
- Repository: Facebookresearch_Habitat_lab
- File: habitat-baselines/habitat_baselines/il/data/nav_data.py
- Lines: 1-558
Signature
class NavDataset(wds.Dataset):
def __init__(
self,
config: "DictConfig",
env: habitat.Env,
device: torch.device,
max_controller_actions: int = 5,
):
Import
from habitat_baselines.il.data.nav_data import NavDataset
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| config | DictConfig | Yes | Full Habitat baselines configuration with dataset, frame path, and CNN checkpoint settings |
| env | habitat.Env | Yes | Habitat environment with an EQA dataset containing episodes with shortest_paths, questions, and answers |
| device | torch.device | Yes | Device for CNN feature extraction (CPU or CUDA) |
| max_controller_actions | int | No | Maximum consecutive controller actions before a forced planner switch (default: 5) |
Outputs
| Name | Type | Description |
|---|---|---|
| Train sample | Tuple | (idx, question, answer, planner_img_feats, planner_actions_in, planner_actions_out, planner_action_length, planner_mask, controller_img_feats, controller_actions_in, planner_hidden_idx, controller_out, controller_action_length, controller_mask) |
| Val sample | Tuple | (idx, question, answer, actions, action_length, goal_pos) |
| get_vocab_dicts() | Tuple[VocabDict, VocabDict] | Question and answer vocabulary dictionaries |
Usage Examples
Basic Usage
import torch
import habitat
from habitat_baselines.il.data.nav_data import NavDataset
# Setup environment
config = habitat.get_config("path/to/eqa_config.yaml")
env = habitat.Env(config=config)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# Create dataset (frames are cached on first run)
dataset = NavDataset(
config=config,
env=env,
device=device,
max_controller_actions=5,
)
# Get vocabulary dictionaries
q_vocab, ans_vocab = dataset.get_vocab_dicts()
print(f"Question vocab size: {len(q_vocab)}")
print(f"Answer vocab size: {len(ans_vocab)}")
print(f"Number of episodes: {len(dataset)}")
print(f"Max question length: {dataset.max_q_len}")
print(f"Max action length: {dataset.max_action_len}")
# Use with a DataLoader via WebDataset pipeline
# The dataset returns preprocessed samples via map_dataset_sample
for sample in dataset:
processed = dataset.map_dataset_sample(sample)
# In train mode: unpack hierarchical features
# In val mode: unpack raw actions and goal position
break