Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Facebookresearch Habitat lab Hierarchical Policy Assembly

From Leeroopedia
Knowledge Sources
Domains Hierarchical_RL, Software_Architecture
Last Updated 2026-02-15 02:00 GMT

Overview

Assembly of pre-trained skill sub-policies under a high-level controller into a single hierarchical policy that can solve multi-step manipulation tasks.

Description

Hierarchical Policy Assembly composes independently trained skill policies (navigation, pick, place, etc.) into a two-level hierarchy. The high-level policy (neural, fixed, or PDDL-based) selects which skill to activate at each decision point. The active skill policy controls the agent's low-level actions until its termination condition triggers. Control then returns to the high-level policy.

This decomposition enables solving long-horizon tasks that are intractable for flat RL policies, by reusing modular, transferable skills.

Usage

Use after training individual skills. The hierarchical policy is assembled during training initialization when using rl_hierarchical.yaml configs.

Theoretical Basis

The options framework formalizes this as:

  1. Option o=(πo,βo,Io): Each skill has an intra-option policy, termination function, and initiation set
  2. Policy over options πΩ: The high-level policy selects options
  3. Call-and-return execution: The selected option executes until termination, then control returns to the high-level policy

Pseudo-code:

# Hierarchical execution
skill = high_level_policy.select_skill(observation)
while not skill.should_terminate(observation):
    action = skill.act(observation)
    observation = env.step(action)
skill = high_level_policy.select_skill(observation)  # next skill

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment