Heuristic:Facebookresearch Habitat lab Force Single Threaded PyTorch
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Reinforcement_Learning |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Counter-intuitive performance optimization: forcing PyTorch to single-threaded mode significantly speeds up RL training by avoiding parallel memory copy overhead.
Description
PyTorch increasingly parallelizes internal memory copy operations across threads. In Habitat-Lab RL training, CPU-side operations are dominated by simple memory copies (rollout buffer management, observation transfers) rather than compute-heavy operations. The parallelization overhead (thread creation, synchronization) dramatically slows down these lightweight operations. Setting `force_torch_single_threaded=True` eliminates this overhead.
Usage
Use this heuristic whenever running RL training in Habitat-Lab (PPO, DD-PPO, VER). The config default is `False` (to match standard PyTorch behavior), but all provided training configs set it to True, and custom configs should do the same.
The Insight (Rule of Thumb)
- Action: Set `habitat_baselines.force_torch_single_threaded: True` in your training config.
- Value: `True` (boolean flag).
- Trade-off: None observed in practice. The default is `False` only for compatibility with standard PyTorch behavior, not because `True` has downsides.
- Scope: Affects all CPU-side PyTorch operations in the training process.
Reasoning
The Habitat-Lab team documented this directly in the config dataclass with an explanatory comment. The insight is that RL training workloads differ fundamentally from typical deep learning training: the CPU is used primarily for environment simulation and data movement, not matrix math. Parallel memory copies add synchronization overhead that exceeds the copy time itself, making single-threaded execution faster.
Code evidence from `habitat-baselines/habitat_baselines/config/default_structured_configs.py:479-487`:
# For our use case, the CPU side things are mainly memory copies
# and nothing of substantive compute. PyTorch has been making
# more and more memory copies parallel, but that just ends up
# slowing those down dramatically and reducing our perf.
# This forces it to be single threaded. The default
# value is left as false as it's different from how
# PyTorch normally behaves, but all configs we provide
# set it to true and yours likely should too
force_torch_single_threaded: bool = False