Principle:Facebookresearch Habitat lab Hierarchical Policy Assembly

Knowledge Sources	Habitat 2.0 The Option-Critic Architecture Habitat-Lab
Domains	Hierarchical_RL, Software_Architecture
Last Updated	2026-02-15 02:00 GMT

Overview

Assembly of pre-trained skill sub-policies under a high-level controller into a single hierarchical policy that can solve multi-step manipulation tasks.

Description

Hierarchical Policy Assembly composes independently trained skill policies (navigation, pick, place, etc.) into a two-level hierarchy. The high-level policy (neural, fixed, or PDDL-based) selects which skill to activate at each decision point. The active skill policy controls the agent's low-level actions until its termination condition triggers. Control then returns to the high-level policy.

This decomposition enables solving long-horizon tasks that are intractable for flat RL policies, by reusing modular, transferable skills.

Usage

Use after training individual skills. The hierarchical policy is assembled during training initialization when using rl_hierarchical.yaml configs.

Theoretical Basis

The options framework formalizes this as:

Option $o = (π_{o}, β_{o}, I_{o})$ : Each skill has an intra-option policy, termination function, and initiation set
Policy over options $π_{Ω}$ : The high-level policy selects options
Call-and-return execution: The selected option executes until termination, then control returns to the high-level policy

Pseudo-code:

# Hierarchical execution
skill = high_level_policy.select_skill(observation)
while not skill.should_terminate(observation):
    action = skill.act(observation)
    observation = env.step(action)
skill = high_level_policy.select_skill(observation)  # next skill

Related Pages

Implemented By

Implementation:Facebookresearch_Habitat_lab_HierarchicalPolicy_init

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment