Heuristic:Facebookresearch Habitat lab DDPPO Straggler Preemption

Knowledge Sources	Habitat-Lab DD-PPO DD-PPO Paper
Domains	Distributed_Training, Optimization
Last Updated	2026-02-15 00:00 GMT

Overview

DD-PPO workers self-preempt when they detect they are stragglers, preventing slow workers from blocking the entire distributed training loop.

Description

In distributed DD-PPO training, environment simulation time varies across workers due to scene complexity differences. Without mitigation, all workers must wait for the slowest one to complete its rollout before the gradient synchronization step. DD-PPO implements a straggler preemption mechanism: a worker ends its rollout early if (a) it has completed at least 25% of its rollout steps and (b) a configurable fraction (default 60%) of other workers have already finished their rollouts.

Usage

This heuristic is automatically active in all DD-PPO distributed training. Tune the `sync_frac` parameter if you observe either excessive idle time (increase it) or too many truncated rollouts (decrease it). Monitor the `fraction_stale` metric to assess staleness.

The Insight (Rule of Thumb)

Action: Leave `SHORT_ROLLOUT_THRESHOLD = 0.25` (hardcoded) and tune `sync_frac` (configurable, default 0.6).
Value: `sync_frac = 0.6` means a worker self-preempts when 60% of all workers are done.
Trade-off: Higher `sync_frac` = more workers wait (slower throughput). Lower `sync_frac` = more truncated rollouts (less data per update). The 0.6 default balances throughput and data quality.
Seed management: Each rank gets seed offset `rank * num_environments` to ensure unique environment seeds across workers.

Reasoning

The preemption mechanism is a core innovation of DD-PPO. Without it, the slowest worker dictates the training speed of the entire cluster. By allowing stragglers to submit partial rollouts (minimum 25% of steps), the system maintains data flow while sacrificing a small amount of data from the slowest environments. The 25% threshold ensures that very short rollouts (which would provide low-quality gradient estimates) are not submitted.

Code evidence from `habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py:641-653`:

def should_end_early(self, rollout_step) -> bool:
    if not self._is_distributed:
        return False
    # This is where the preemption of workers happens.  If a
    # worker detects it will be a straggler, it preempts itself!
    return (
        rollout_step
        >= self.config.habitat_baselines.rl.ppo.num_steps
        * self.SHORT_ROLLOUT_THRESHOLD
    ) and int(self.num_rollouts_done_store.get("num_done")) >= (
        self.config.habitat_baselines.rl.ddppo.sync_frac
        * torch.distributed.get_world_size()
    )

Distributed seed management from `habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py:207-211`:

# Multiply by the number of simulators to make sure they also get unique seeds
self.config.habitat.seed += (
    torch.distributed.get_rank()
    * self.config.habitat_baselines.num_environments
)

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment