Heuristic:Facebookresearch Habitat lab VER Tuning Guidelines
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Reinforcement_Learning |
| Last Updated | 2026-02-15 00:00 GMT |
Overview
Configuration guidelines for the Variable Experience Rollout (VER) trainer covering inference worker count, overlapped learning, cuDNN settings, and when to disable variable experience.
Description
VER (Variable Experience Rollout) decouples environment stepping from policy inference by running them in separate processes. This architecture introduces several tunable parameters that significantly affect throughput: the number of inference workers, whether to overlap experience collection with learning, queue implementation performance, and cuDNN benchmark settings. The VER README and source code contain specific guidance on when and how to tune these parameters.
Usage
Apply these guidelines when configuring VER-based training (as opposed to standard PPO). VER is particularly beneficial when environments have heterogeneous simulation times or when the policy is large relative to environment compute.
The Insight (Rule of Thumb)
- Variable experience: Disable (`variable_experience: False`) when environment simulation times vary by more than 100x between fastest and slowest environments.
- Inference workers: Small policy models → reduce `num_inference_workers` (default 2). Large policy models → increase workers to keep GPUs busy.
- Overlapped collection: Only enable `overlap_rollouts_and_learn: True` when environment is CPU-bound. Trades sample efficiency for throughput; increases memory usage (each inference worker gets its own weight copy).
- cuDNN benchmark: VER sets `benchmark=False` in inference workers (deterministic, better for small/variable batches) and `benchmark=True` in the learner (optimized for consistent large batches).
- TF32 precision: Enabled globally in VER trainer. Negligible accuracy loss with speedup on Ampere+ GPUs.
- faster_fifo: Install `faster_fifo>=1.4.2` for production VER training. Without it, inter-process queue operations fall back to a slower Python implementation.
- Batch renderer: VER does not support batch rendering. This is a hard constraint.
Reasoning
VER decouples the tightly coupled PPO training loop into asynchronous producers (environments + inference workers) and a consumer (learner). This architecture is optimal when environments are the bottleneck, but introduces complexity in weight synchronization and importance sampling.
Code evidence for faster_fifo fallback from `habitat-baselines/habitat_baselines/rl/ver/queue.py:31-34`:
warnings.warn(
"Unable to import faster_fifo."
" Using the fallback. This may reduce performance."
)
Code evidence for batch renderer assertion from `habitat-baselines/habitat_baselines/rl/ver/ver_trainer.py:84-86`:
assert (
not self.config.habitat.simulator.renderer.enable_batch_renderer
), "VER trainer does not support batch rendering."
Code evidence for cuDNN settings from `habitat-baselines/habitat_baselines/rl/ver/ver_trainer.py:59-63`:
try:
torch.backends.cudnn.allow_tf32 = True
torch.backends.cuda.matmul.allow_tf32 = True
except AttributeError:
pass
VER README guidance from `habitat-baselines/habitat_baselines/rl/ver/README.md`:
If you have environments with extreme differences in simulation time (i.e. the fastest environment is more than 100x faster than the slowest), consider disabling variable experience rollouts.
If you have a very small policy, consider reducing the number of inference workers. If you have a very large model, consider increasing the number of inference workers.
If your environment is largely dominated by CPU time, consider overlapping experience collection and learning.