Implementation:Facebookresearch Habitat lab HRLPPO update

Knowledge Sources	Habitat-Lab
Domains	Hierarchical_RL, Reinforcement_Learning
Last Updated	2026-02-15 02:00 GMT

Overview

Concrete PPO update variant for hierarchical RL that updates only the high-level policy parameters using skill-level transitions, provided by habitat-baselines.

Description

HRLPPO._update_from_batch extends PPO's update to work with skill-level rollout storage. It filters gradients to only update the high-level policy (skills remain frozen), computes advantages over skill-level transitions stored in HrlRolloutStorage, and applies the standard PPO clipped objective at the temporal abstraction level.

Usage

Used automatically during hierarchical RL training when PPOTrainer._update_agent delegates to the HRLPPO (or HRLDDPPO) agent.

Code Reference

Source Location

Repository: habitat-lab
File: habitat-baselines/habitat_baselines/rl/hrl/hrl_ppo.py
Lines: L19-131 (HRLPPO class)

Signature

class HRLPPO(PPO):
    def _update_from_batch(
        self,
        batch,
        epoch,
        rollouts,
        learner_metrics,
    ):
        """
        PPO update on skill-level transitions.
        Only updates high-level policy parameters.
        """

class HRLDDPPO(DecentralizedDistributedMixin, HRLPPO):
    """Distributed variant of HRLPPO for multi-GPU training."""

Import

from habitat_baselines.rl.hrl.hrl_ppo import HRLPPO, HRLDDPPO

I/O Contract

Inputs

Name	Type	Required	Description
batch	TensorDict	Yes	Mini-batch from HrlRolloutStorage with skill-level transitions
epoch	int	Yes	Current PPO epoch
rollouts	HrlRolloutStorage	Yes	Skill-level rollout buffer
learner_metrics	Dict	Yes	Accumulates loss values

Outputs

Name	Type	Description
learner_metrics	Dict[str, float]	Updated with value_loss, action_loss, entropy

Usage Examples

HRL Training (Automated)

# HRLPPO is selected automatically when using hierarchical policy config
from habitat_baselines.config.default import get_config
from habitat_baselines.rl.ppo.ppo_trainer import PPOTrainer

config = get_config("rearrange/rl_hierarchical.yaml")
trainer = PPOTrainer(config)
trainer.train()  # Uses HRLPPO internally

Related Pages

Implements Principle

Principle:Facebookresearch_Habitat_lab_High_level_Policy_Training

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment