Implementation:Facebookresearch Habitat lab NavDataset

Knowledge Sources	Facebookresearch_Habitat_lab
Domains	Embodied_AI, Imitation_Learning, Navigation_Tasks
Last Updated	2026-02-15 00:00 GMT

Overview

NavDataset is a PyTorch dataset extending webdataset.Dataset that provides navigation data for the PACMAN-based Embodied Question Answering (EQA) imitation learning pipeline, handling frame caching, hierarchical action decomposition, and CNN feature extraction.

Description

NavDataset manages the full data pipeline for training a hierarchical navigation model that decomposes flat action sequences into planner and controller actions. Key responsibilities include:

Data Loading and Caching:

On first use, iterates through all scenes and episodes, rendering RGB frames at each position along the shortest path and saving them to disk as JPEG images.
Compresses frame directories into tar archives using create_tar_archive() for efficient streaming via WebDataset.
Checks for existing caches with cache_exists() to avoid redundant rendering.

Hierarchical Action Decomposition:

flat_to_hierarchical_actions(): Converts a flat sequence of navigation actions (FWD, LEFT, RIGHT, STOP) into a two-level hierarchy:
- Planner actions: High-level direction changes (when the action type changes).
- Controller actions: Low-level continuation signals (1 = continue same action, 0 = action changed).
Actions are preprocessed with an offset (0=NULL, 1=START, 2=FWD, 3=LEFT, 4=RIGHT, 5=STOP) and padded to max_action_len.

Feature Extraction:

Uses a pretrained MultitaskCNN (loaded from a checkpoint) to extract image features from RGB frames.
get_img_features(): Preprocesses images (transpose, normalize to [0,1]) and runs them through the CNN encoder.

WebDataset Integration:

group_by_keys_(): Custom WebDataset pipeline function that groups tar archive entries by episode, attaching question tokens and answer indices from the EQA dataset.
map_dataset_sample(): Transforms raw WebDataset samples into training-ready tensors including planner/controller image features, action sequences, hidden index mappings, and masks.

Vocabulary Management:

restructure_ans_vocab(): Remaps answer vocabulary to consecutive integer indices.
get_vocab_dicts(): Returns question and answer VocabDict instances.

The class operates in two modes: train (returns full hierarchical features) and val (returns raw actions, action length, and goal position for evaluation with backtracking).

Usage

NavDataset is used within the EQA imitation learning pipeline. It is instantiated with a Habitat environment and configuration, automatically caching frames on first run. Subsequent runs load from the tar archive for fast streaming.

Code Reference

Source Location

Repository: Facebookresearch_Habitat_lab
File: habitat-baselines/habitat_baselines/il/data/nav_data.py
Lines: 1-558

Signature

class NavDataset(wds.Dataset):
    def __init__(
        self,
        config: "DictConfig",
        env: habitat.Env,
        device: torch.device,
        max_controller_actions: int = 5,
    ):

Import

from habitat_baselines.il.data.nav_data import NavDataset

I/O Contract

Inputs

Name	Type	Required	Description
config	DictConfig	Yes	Full Habitat baselines configuration with dataset, frame path, and CNN checkpoint settings
env	habitat.Env	Yes	Habitat environment with an EQA dataset containing episodes with shortest_paths, questions, and answers
device	torch.device	Yes	Device for CNN feature extraction (CPU or CUDA)
max_controller_actions	int	No	Maximum consecutive controller actions before a forced planner switch (default: 5)

Outputs

Name	Type	Description
Train sample	Tuple	(idx, question, answer, planner_img_feats, planner_actions_in, planner_actions_out, planner_action_length, planner_mask, controller_img_feats, controller_actions_in, planner_hidden_idx, controller_out, controller_action_length, controller_mask)
Val sample	Tuple	(idx, question, answer, actions, action_length, goal_pos)
get_vocab_dicts()	Tuple[VocabDict, VocabDict]	Question and answer vocabulary dictionaries

Usage Examples

Basic Usage

import torch
import habitat
from habitat_baselines.il.data.nav_data import NavDataset

# Setup environment
config = habitat.get_config("path/to/eqa_config.yaml")
env = habitat.Env(config=config)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create dataset (frames are cached on first run)
dataset = NavDataset(
    config=config,
    env=env,
    device=device,
    max_controller_actions=5,
)

# Get vocabulary dictionaries
q_vocab, ans_vocab = dataset.get_vocab_dicts()
print(f"Question vocab size: {len(q_vocab)}")
print(f"Answer vocab size: {len(ans_vocab)}")
print(f"Number of episodes: {len(dataset)}")
print(f"Max question length: {dataset.max_q_len}")
print(f"Max action length: {dataset.max_action_len}")

# Use with a DataLoader via WebDataset pipeline
# The dataset returns preprocessed samples via map_dataset_sample
for sample in dataset:
    processed = dataset.map_dataset_sample(sample)
    # In train mode: unpack hierarchical features
    # In val mode: unpack raw actions and goal position
    break

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment