Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Implementation:Facebookresearch Habitat lab PPOTrainer train

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Training
Last Updated 2026-02-15 02:00 GMT

Overview

Concrete training loop for PPO/DD-PPO agents in Habitat environments, provided by habitat-baselines. This is the primary PointNav training entry point.

Description

The PPOTrainer.train method implements the complete DD-PPO training loop for navigation agents. It initializes environments, builds the policy, and enters the main loop: collecting rollouts via `_compute_actions_and_step_envs`, updating the policy via `PPO.update`, logging metrics, and saving checkpoints. Supports distributed training, resume from checkpoint, and SLURM job requeuing.

Usage

Called by `execute_exp` when the run type is `train`. This is the default training method for PointNav, ObjectNav, and other single-policy RL tasks.

Code Reference

Source Location

  • Repository: habitat-lab
  • File: habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py
  • Lines: L655-801 (train method), L343-399 (_compute_actions_and_step_envs), L489-522 (_update_agent)

Signature

class PPOTrainer(BaseRLTrainer):
    def train(self) -> None:
        """
        Main method for training DD/PPO.

        Initializes environments and policy, then enters the training loop:
        1. Collect rollouts from vectorized environments
        2. Compute advantages (GAE)
        3. Update policy via PPO clipped objective
        4. Log metrics and save checkpoints
        """

Import

from habitat_baselines.rl.ppo.ppo_trainer import PPOTrainer

I/O Contract

Inputs

Name Type Required Description
self.config DictConfig Yes Complete experiment config (set during __init__)
self.envs VectorEnv Yes Vectorized environments (created in _init_train)
self._agent PPO/DDPPO Yes PPO agent wrapping the policy (created in _init_train)

Outputs

Name Type Description
Checkpoints .pth files Saved policy checkpoints at `ckpt.{N}.pth`
Logs TensorBoard/WandB Training metrics: value_loss, action_loss, entropy, reward, SPL, etc.

Usage Examples

Launch PPO Training

from habitat_baselines.config.default import get_config
from habitat_baselines.rl.ppo.ppo_trainer import PPOTrainer

# Load config
config = get_config("pointnav/ppo_pointnav.yaml")

# Create trainer and run
trainer = PPOTrainer(config)
trainer.train()

Via Command Line

python -u habitat-baselines/habitat_baselines/run.py \
    --exp-config habitat-baselines/habitat_baselines/config/pointnav/ppo_pointnav.yaml \
    --run-type train

Related Pages

Implements Principle

Requires Environment

Uses Heuristic

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment