Heuristic:ARISE Initiative Robomimic Data Worker Tuning By Modality
| Knowledge Sources | |
|---|---|
| Domains | Optimization, Data_Management |
| Last Updated | 2026-02-15 07:30 GMT |
Overview
Set `num_data_workers=0` for low-dimensional datasets and `num_data_workers=2` for image datasets to balance data loading throughput against overhead.
Description
PyTorch's `DataLoader` supports multi-process data loading via the `num_workers` parameter. For low-dimensional observation datasets (state vectors, joint positions), the data is small enough that the overhead of spawning worker processes exceeds the benefit — setting `num_workers=0` loads data in the main process without any inter-process communication cost. For image-based datasets, I/O and image decoding create a bottleneck that benefits from parallel workers. The robomimic codebase recommends 2 workers for image datasets as a balanced default.
Usage
Apply this heuristic when configuring `config.train.num_data_workers`. The default of 0 is correct for the majority of robomimic experiments, which use low-dimensional observations. Switch to 2 when training with image observations (RGB cameras) or depth observations.
The Insight (Rule of Thumb)
- Action: Set `config.train.num_data_workers` based on observation type.
- Value:
- `0` — Low-dimensional datasets (state vectors, joint positions, object positions)
- `2` — Image datasets (camera observations, depth images)
- Trade-off: More workers increase memory usage and process management overhead. For low-dim data where each sample fetch is microseconds, the multiprocessing overhead dominates.
- Companion setting: Ensure `hdf5_use_swmr=True` when using workers > 0 to prevent HDF5 file locking issues.
Reasoning
From `robomimic/config/base_config.py:163-164`:
# num workers for loading data - generally set to 0 for low-dim datasets, and 2 for image datasets
self.train.num_data_workers = 0
Low-dimensional observations are typically a few hundred bytes per sample and are often fully cached in memory (via `hdf5_cache_mode="all"`), meaning each `__getitem__` call is just a numpy array slice — no I/O at all. Adding worker processes in this scenario only adds serialization overhead from the inter-process queue. Image observations require reading and potentially decompressing JPEG/PNG data from HDF5, where parallel I/O provides genuine throughput improvement.