Workflow:Facebookresearch Habitat lab Rearrangement HRL Training

Knowledge Sources	Habitat-Lab Habitat Docs Habitat 2.0 Habitat 3.0
Domains	Embodied_AI, Reinforcement_Learning, Manipulation, Hierarchical_RL
Last Updated	2026-02-15 02:00 GMT

Overview

End-to-end process for training hierarchical reinforcement learning (HRL) policies that decompose long-horizon rearrangement tasks into reusable low-level skills orchestrated by a high-level policy.

Description

This workflow covers training agents to perform object rearrangement tasks (pick, place, navigate, open/close) using a two-layer hierarchical policy architecture. The high-level policy selects which low-level skill to execute at each decision point, while low-level skills handle the specific motor commands for navigation, grasping, and placement. The system supports multiple high-level strategies (learned neural, fixed planner, PDDL-based) and multiple low-level skill types (learned, oracle, scripted). Training uses PPO/DD-PPO with a specialized HRL rollout storage that handles variable-length skill executions.

Usage

Execute this workflow when you need an agent to perform multi-step manipulation tasks in simulated indoor environments, such as picking up objects and placing them at target locations, or more complex composite tasks like setting a table. This is the primary workflow for Habitat 2.0/3.0 rearrangement research.

Execution Steps

Step 1: Dataset and Scene Preparation

Download the ReplicaCAD or HSSD scene dataset and corresponding rearrangement episode data. Episodes define initial object placements and target goal states. The PDDL domain specification defines the available predicates and actions for the task planner.

Key considerations:

ReplicaCAD provides articulated furniture (drawers, cabinets) for interactive tasks
Episode datasets specify which objects to rearrange and their goal locations
PDDL domain files define the planning language for oracle high-level policies
Multiple rearrangement tasks are available (pick, place, open/close, composite set_table)

Step 2: Configuration Composition

Select the appropriate HRL configuration combining task definition, agent embodiment (Fetch, Spot, humanoid), high-level policy type, and low-level skill definitions. The configuration system composes from multiple YAML fragments specifying the hierarchical policy architecture.

Key considerations:

Choose high-level policy type: `hl_neural` (learned), `hl_fixed` (scripted), or `hl_planner` (PDDL)
Choose skill definitions: `nn_skills` (learned) or `oracle_skills` (perfect)
Agent embodiment configs define robot morphology and action spaces
Task configs specify which rearrangement objective to train on

Step 3: Low_level Skill Training

Train individual low-level skills (navigation, pick, place, open/close drawer) independently using monolithic RL. Each skill has its own reward function and termination condition. Skills are trained on simplified sub-tasks before being composed into the hierarchical policy.

Key considerations:

Each skill is trained with task-specific reward shaping
Skill termination conditions determine when control returns to the high-level policy
Pre-trained skill checkpoints are loaded when training the high-level policy
Oracle skills can substitute for learned skills during development

Step 4: Hierarchical Policy Assembly

Assemble the hierarchical policy by connecting the trained low-level skills with the high-level policy. The HierarchicalPolicy class manages skill selection, observation routing, and action space composition. Skill parameters (target objects, navigation goals) are passed from the high-level to low-level policies.

Key considerations:

The high-level policy observes a filtered observation space relevant to skill selection
Low-level skills receive skill-specific observations and action spaces
Skill chaining handles transitions between consecutive skills smoothly
The HRL rollout storage manages variable-length skill episode segments

Step 5: High_level Policy Training

Train the high-level policy using PPO/DD-PPO while the low-level skills are frozen. The high-level policy learns when to invoke each skill based on the current environment state. Training uses the composite task reward signal that reflects overall rearrangement progress.

Key considerations:

Low-level skill weights are frozen during high-level training
The high-level observes skill termination signals and environment state
Training reward reflects composite task completion (all objects in goal states)
Multi-agent variants (social rearrangement) train robot and humanoid policies jointly

Step 6: Evaluation and Analysis

Evaluate the complete hierarchical policy on held-out episodes. Metrics include task success rate, composite task completion percentage, and per-skill execution statistics. Video recordings visualize the full rearrangement sequence.

Key considerations:

Evaluation runs the complete hierarchical policy end-to-end
Per-skill metrics help diagnose which components need improvement
Oracle baselines provide upper bounds for comparison
Multi-agent evaluation includes collaboration metrics

Execution Diagram

GitHub URL

Workflow Repository