Heuristic:Facebookresearch Habitat lab VER Tuning Guidelines

Knowledge Sources	VER README Habitat-Lab Core Team
Domains	Optimization, Reinforcement_Learning
Last Updated	2026-02-15 00:00 GMT

Overview

Configuration guidelines for the Variable Experience Rollout (VER) trainer covering inference worker count, overlapped learning, cuDNN settings, and when to disable variable experience.

Description

VER (Variable Experience Rollout) decouples environment stepping from policy inference by running them in separate processes. This architecture introduces several tunable parameters that significantly affect throughput: the number of inference workers, whether to overlap experience collection with learning, queue implementation performance, and cuDNN benchmark settings. The VER README and source code contain specific guidance on when and how to tune these parameters.

Usage

Apply these guidelines when configuring VER-based training (as opposed to standard PPO). VER is particularly beneficial when environments have heterogeneous simulation times or when the policy is large relative to environment compute.

The Insight (Rule of Thumb)

Variable experience: Disable (`variable_experience: False`) when environment simulation times vary by more than 100x between fastest and slowest environments.
Inference workers: Small policy models → reduce `num_inference_workers` (default 2). Large policy models → increase workers to keep GPUs busy.
Overlapped collection: Only enable `overlap_rollouts_and_learn: True` when environment is CPU-bound. Trades sample efficiency for throughput; increases memory usage (each inference worker gets its own weight copy).
cuDNN benchmark: VER sets `benchmark=False` in inference workers (deterministic, better for small/variable batches) and `benchmark=True` in the learner (optimized for consistent large batches).
TF32 precision: Enabled globally in VER trainer. Negligible accuracy loss with speedup on Ampere+ GPUs.
faster_fifo: Install `faster_fifo>=1.4.2` for production VER training. Without it, inter-process queue operations fall back to a slower Python implementation.
Batch renderer: VER does not support batch rendering. This is a hard constraint.

Reasoning

VER decouples the tightly coupled PPO training loop into asynchronous producers (environments + inference workers) and a consumer (learner). This architecture is optimal when environments are the bottleneck, but introduces complexity in weight synchronization and importance sampling.

Code evidence for faster_fifo fallback from `habitat-baselines/habitat_baselines/rl/ver/queue.py:31-34`:

warnings.warn(
    "Unable to import faster_fifo."
    " Using the fallback. This may reduce performance."
)

Code evidence for batch renderer assertion from `habitat-baselines/habitat_baselines/rl/ver/ver_trainer.py:84-86`:

assert (
    not self.config.habitat.simulator.renderer.enable_batch_renderer
), "VER trainer does not support batch rendering."

Code evidence for cuDNN settings from `habitat-baselines/habitat_baselines/rl/ver/ver_trainer.py:59-63`:

try:
    torch.backends.cudnn.allow_tf32 = True
    torch.backends.cuda.matmul.allow_tf32 = True
except AttributeError:
    pass

VER README guidance from `habitat-baselines/habitat_baselines/rl/ver/README.md`:

If you have environments with extreme differences in simulation time (i.e. the fastest environment is more than 100x faster than the slowest), consider disabling variable experience rollouts.

If you have a very small policy, consider reducing the number of inference workers. If you have a very large model, consider increasing the number of inference workers.

If your environment is largely dominated by CPU time, consider overlapping experience collection and learning.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment