Heuristic:Isaac sim IsaacGymEnvs Determinism Performance Tradeoff
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Reproducibility |
| Last Updated | 2026-02-15 09:00 GMT |
Overview
Keep `torch_deterministic=False` (default) for maximum training speed; only enable for debugging reproducibility issues, and avoid PyTorch 1.9/1.9.1 which have determinism bugs.
Description
IsaacGymEnvs defaults to non-deterministic training (`torch_deterministic: False`) which enables `cudnn.benchmark=True` for faster convolution algorithm selection. Enabling determinism (`torch_deterministic: True`) forces `cudnn.benchmark=False`, `cudnn.deterministic=True`, `torch.use_deterministic_algorithms(True)`, and sets `CUBLAS_WORKSPACE_CONFIG=':4096:8'`. This trades significant training speed for bit-exact reproducibility across runs. However, even with determinism enabled, GPU work scheduling during domain randomization can still cause divergence.
Usage
Use this heuristic when deciding between training speed and reproducibility. For regular training and hyperparameter search, keep the default (`False`). Only enable determinism when debugging non-reproducible results or when exact comparison between runs is required. Be aware that PyTorch 1.9 and 1.9.1 have known bugs that cause crashes with `torch_deterministic=True`.
The Insight (Rule of Thumb)
- Action: Leave `torch_deterministic: False` in `config.yaml` for production training. Set `seed: 42` for consistent initialization. Use `torch_deterministic: True` only for debugging.
- Value: Default seed is 42. If `torch_deterministic=True` and `seed=-1`, the seed is forced to 42.
- Trade-off: Deterministic mode enables `cudnn.benchmark=False` which prevents cuDNN from auto-tuning convolution algorithms, reducing training speed. It also sets `CUBLAS_WORKSPACE_CONFIG` which constrains CUBLAS memory usage.
- Caveat: Even with full determinism enabled, runtime domain randomization of object scales and masses can still cause non-deterministic behavior due to CPU-to-GPU parameter passing in lower-level APIs.
Reasoning
GPU parallel execution is inherently non-deterministic at the floating-point level. Operations on thousands of parallel environments are scheduled by the GPU hardware, and small differences in execution order cause least-significant-bit variations that accumulate over thousands of frames. The `cudnn.benchmark=True` setting compounds this by selecting different (potentially non-deterministic) algorithms per input size.
From `utils.py:87-113`:
def set_seed(seed, torch_deterministic=False, rank=0):
if seed == -1 and torch_deterministic:
seed = 42 + rank
elif seed == -1:
seed = np.random.randint(0, 10000)
# ...
if torch_deterministic:
os.environ['CUBLAS_WORKSPACE_CONFIG'] = ':4096:8'
torch.backends.cudnn.benchmark = False
torch.backends.cudnn.deterministic = True
torch.use_deterministic_algorithms(True)
else:
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
From `docs/reproducibility.md` on DR limitations:
Runtime domain randomization of object scales or masses are known to cause both determinacy and simulation issues when running on the GPU due to the way those parameters are passed from CPU to GPU in lower level APIs. By default, we use the `setup_only` flag to only randomize scales and masses once before simulation starts.
From `docs/reproducibility.md` on PyTorch version bugs:
In PyTorch version 1.9 and 1.9.1 there appear to be bugs affecting the `torch_deterministic` setting, and using this mode will result in a crash.