Principle:Facebookresearch Habitat lab Low level Skill Training
| Knowledge Sources | |
|---|---|
| Domains | Hierarchical_RL, Reinforcement_Learning |
| Last Updated | 2026-02-15 02:00 GMT |
Overview
Independent PPO training of individual manipulation skills (navigation, pick, place) that form the building blocks of a hierarchical policy.
Description
Low-level Skill Training trains each atomic skill (navigate-to-object, pick, place, open, close) as an independent RL policy using PPO. Each skill operates on a subset of the full action space and receives task-specific reward shaping. The resulting checkpoints are later loaded into the hierarchical policy as frozen sub-policies.
Skills are defined by the SkillPolicy abstract base class, which specifies the observation filtering, action sub-space mapping, and termination condition interfaces.
Usage
Train individual skills before assembling the hierarchical policy. Each skill is trained with its own config variant (e.g., rl_skill.yaml with task-specific overrides).
Theoretical Basis
The skill decomposition follows the options framework:
- Skill = Option: Each skill is a temporally extended action with its own policy, initiation set, and termination condition
- Independent training: Skills are trained in isolation with shaped rewards specific to their sub-task
- Composability: Trained skills are composed by a high-level policy that selects which skill to execute
Pseudo-code:
# Train each skill independently
for skill_name in ["nav", "pick", "place"]:
config = get_config(f"rearrange/rl_skill.yaml", overrides=[f"task={skill_name}"])
trainer = PPOTrainer(config)
trainer.train()
# Produces checkpoint: checkpoints/{skill_name}.pth