Principle:Facebookresearch Habitat lab Hierarchical Policy Assembly
| Knowledge Sources | |
|---|---|
| Domains | Hierarchical_RL, Software_Architecture |
| Last Updated | 2026-02-15 02:00 GMT |
Overview
Assembly of pre-trained skill sub-policies under a high-level controller into a single hierarchical policy that can solve multi-step manipulation tasks.
Description
Hierarchical Policy Assembly composes independently trained skill policies (navigation, pick, place, etc.) into a two-level hierarchy. The high-level policy (neural, fixed, or PDDL-based) selects which skill to activate at each decision point. The active skill policy controls the agent's low-level actions until its termination condition triggers. Control then returns to the high-level policy.
This decomposition enables solving long-horizon tasks that are intractable for flat RL policies, by reusing modular, transferable skills.
Usage
Use after training individual skills. The hierarchical policy is assembled during training initialization when using rl_hierarchical.yaml configs.
Theoretical Basis
The options framework formalizes this as:
- Option : Each skill has an intra-option policy, termination function, and initiation set
- Policy over options : The high-level policy selects options
- Call-and-return execution: The selected option executes until termination, then control returns to the high-level policy
Pseudo-code:
# Hierarchical execution
skill = high_level_policy.select_skill(observation)
while not skill.should_terminate(observation):
action = skill.act(observation)
observation = env.step(action)
skill = high_level_policy.select_skill(observation) # next skill