Implementation:Facebookresearch Habitat lab PPOTrainer train
| Knowledge Sources | |
|---|---|
| Domains | Reinforcement_Learning, Training |
| Last Updated | 2026-02-15 02:00 GMT |
Overview
Concrete training loop for PPO/DD-PPO agents in Habitat environments, provided by habitat-baselines. This is the primary PointNav training entry point.
Description
The PPOTrainer.train method implements the complete DD-PPO training loop for navigation agents. It initializes environments, builds the policy, and enters the main loop: collecting rollouts via `_compute_actions_and_step_envs`, updating the policy via `PPO.update`, logging metrics, and saving checkpoints. Supports distributed training, resume from checkpoint, and SLURM job requeuing.
Usage
Called by `execute_exp` when the run type is `train`. This is the default training method for PointNav, ObjectNav, and other single-policy RL tasks.
Code Reference
Source Location
- Repository: habitat-lab
- File: habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py
- Lines: L655-801 (train method), L343-399 (_compute_actions_and_step_envs), L489-522 (_update_agent)
Signature
class PPOTrainer(BaseRLTrainer):
def train(self) -> None:
"""
Main method for training DD/PPO.
Initializes environments and policy, then enters the training loop:
1. Collect rollouts from vectorized environments
2. Compute advantages (GAE)
3. Update policy via PPO clipped objective
4. Log metrics and save checkpoints
"""
Import
from habitat_baselines.rl.ppo.ppo_trainer import PPOTrainer
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| self.config | DictConfig | Yes | Complete experiment config (set during __init__) |
| self.envs | VectorEnv | Yes | Vectorized environments (created in _init_train) |
| self._agent | PPO/DDPPO | Yes | PPO agent wrapping the policy (created in _init_train) |
Outputs
| Name | Type | Description |
|---|---|---|
| Checkpoints | .pth files | Saved policy checkpoints at `ckpt.{N}.pth` |
| Logs | TensorBoard/WandB | Training metrics: value_loss, action_loss, entropy, reward, SPL, etc. |
Usage Examples
Launch PPO Training
from habitat_baselines.config.default import get_config
from habitat_baselines.rl.ppo.ppo_trainer import PPOTrainer
# Load config
config = get_config("pointnav/ppo_pointnav.yaml")
# Create trainer and run
trainer = PPOTrainer(config)
trainer.train()
Via Command Line
python -u habitat-baselines/habitat_baselines/run.py \
--exp-config habitat-baselines/habitat_baselines/config/pointnav/ppo_pointnav.yaml \
--run-type train
Related Pages
Implements Principle
Requires Environment
- Environment:Facebookresearch_Habitat_lab_CUDA_GPU_Training_Environment
- Environment:Facebookresearch_Habitat_lab_SLURM_Distributed_Environment
Uses Heuristic
- Heuristic:Facebookresearch_Habitat_lab_Force_Single_Threaded_PyTorch
- Heuristic:Facebookresearch_Habitat_lab_Mini_Batch_Environment_Divisibility
- Heuristic:Facebookresearch_Habitat_lab_DDPPO_Straggler_Preemption
- Heuristic:Facebookresearch_Habitat_lab_VER_Tuning_Guidelines
- Heuristic:Facebookresearch_Habitat_lab_Resume_State_Config_Override