Implementation:Facebookresearch Habitat lab HRLPPO update
| Knowledge Sources | |
|---|---|
| Domains | Hierarchical_RL, Reinforcement_Learning |
| Last Updated | 2026-02-15 02:00 GMT |
Overview
Concrete PPO update variant for hierarchical RL that updates only the high-level policy parameters using skill-level transitions, provided by habitat-baselines.
Description
HRLPPO._update_from_batch extends PPO's update to work with skill-level rollout storage. It filters gradients to only update the high-level policy (skills remain frozen), computes advantages over skill-level transitions stored in HrlRolloutStorage, and applies the standard PPO clipped objective at the temporal abstraction level.
Usage
Used automatically during hierarchical RL training when PPOTrainer._update_agent delegates to the HRLPPO (or HRLDDPPO) agent.
Code Reference
Source Location
- Repository: habitat-lab
- File: habitat-baselines/habitat_baselines/rl/hrl/hrl_ppo.py
- Lines: L19-131 (HRLPPO class)
Signature
class HRLPPO(PPO):
def _update_from_batch(
self,
batch,
epoch,
rollouts,
learner_metrics,
):
"""
PPO update on skill-level transitions.
Only updates high-level policy parameters.
"""
class HRLDDPPO(DecentralizedDistributedMixin, HRLPPO):
"""Distributed variant of HRLPPO for multi-GPU training."""
Import
from habitat_baselines.rl.hrl.hrl_ppo import HRLPPO, HRLDDPPO
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| batch | TensorDict | Yes | Mini-batch from HrlRolloutStorage with skill-level transitions |
| epoch | int | Yes | Current PPO epoch |
| rollouts | HrlRolloutStorage | Yes | Skill-level rollout buffer |
| learner_metrics | Dict | Yes | Accumulates loss values |
Outputs
| Name | Type | Description |
|---|---|---|
| learner_metrics | Dict[str, float] | Updated with value_loss, action_loss, entropy |
Usage Examples
HRL Training (Automated)
# HRLPPO is selected automatically when using hierarchical policy config
from habitat_baselines.config.default import get_config
from habitat_baselines.rl.ppo.ppo_trainer import PPOTrainer
config = get_config("rearrange/rl_hierarchical.yaml")
trainer = PPOTrainer(config)
trainer.train() # Uses HRLPPO internally