Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Facebookresearch Habitat lab HRLPPO update

From Leeroopedia
Knowledge Sources
Domains Hierarchical_RL, Reinforcement_Learning
Last Updated 2026-02-15 02:00 GMT

Overview

Concrete PPO update variant for hierarchical RL that updates only the high-level policy parameters using skill-level transitions, provided by habitat-baselines.

Description

HRLPPO._update_from_batch extends PPO's update to work with skill-level rollout storage. It filters gradients to only update the high-level policy (skills remain frozen), computes advantages over skill-level transitions stored in HrlRolloutStorage, and applies the standard PPO clipped objective at the temporal abstraction level.

Usage

Used automatically during hierarchical RL training when PPOTrainer._update_agent delegates to the HRLPPO (or HRLDDPPO) agent.

Code Reference

Source Location

  • Repository: habitat-lab
  • File: habitat-baselines/habitat_baselines/rl/hrl/hrl_ppo.py
  • Lines: L19-131 (HRLPPO class)

Signature

class HRLPPO(PPO):
    def _update_from_batch(
        self,
        batch,
        epoch,
        rollouts,
        learner_metrics,
    ):
        """
        PPO update on skill-level transitions.
        Only updates high-level policy parameters.
        """

class HRLDDPPO(DecentralizedDistributedMixin, HRLPPO):
    """Distributed variant of HRLPPO for multi-GPU training."""

Import

from habitat_baselines.rl.hrl.hrl_ppo import HRLPPO, HRLDDPPO

I/O Contract

Inputs

Name Type Required Description
batch TensorDict Yes Mini-batch from HrlRolloutStorage with skill-level transitions
epoch int Yes Current PPO epoch
rollouts HrlRolloutStorage Yes Skill-level rollout buffer
learner_metrics Dict Yes Accumulates loss values

Outputs

Name Type Description
learner_metrics Dict[str, float] Updated with value_loss, action_loss, entropy

Usage Examples

HRL Training (Automated)

# HRLPPO is selected automatically when using hierarchical policy config
from habitat_baselines.config.default import get_config
from habitat_baselines.rl.ppo.ppo_trainer import PPOTrainer

config = get_config("rearrange/rl_hierarchical.yaml")
trainer = PPOTrainer(config)
trainer.train()  # Uses HRLPPO internally

Related Pages

Implements Principle

Requires Environment

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment