Heuristic:ARISE Initiative Robomimic Data Worker Tuning By Modality

Knowledge Sources	robomimic robomimic developers
Domains	Optimization, Data_Management
Last Updated	2026-02-15 07:30 GMT

Overview

Set `num_data_workers=0` for low-dimensional datasets and `num_data_workers=2` for image datasets to balance data loading throughput against overhead.

Description

PyTorch's `DataLoader` supports multi-process data loading via the `num_workers` parameter. For low-dimensional observation datasets (state vectors, joint positions), the data is small enough that the overhead of spawning worker processes exceeds the benefit — setting `num_workers=0` loads data in the main process without any inter-process communication cost. For image-based datasets, I/O and image decoding create a bottleneck that benefits from parallel workers. The robomimic codebase recommends 2 workers for image datasets as a balanced default.

Usage

Apply this heuristic when configuring `config.train.num_data_workers`. The default of 0 is correct for the majority of robomimic experiments, which use low-dimensional observations. Switch to 2 when training with image observations (RGB cameras) or depth observations.

The Insight (Rule of Thumb)

Action: Set `config.train.num_data_workers` based on observation type.
Value:
- `0` — Low-dimensional datasets (state vectors, joint positions, object positions)
- `2` — Image datasets (camera observations, depth images)
Trade-off: More workers increase memory usage and process management overhead. For low-dim data where each sample fetch is microseconds, the multiprocessing overhead dominates.
Companion setting: Ensure `hdf5_use_swmr=True` when using workers > 0 to prevent HDF5 file locking issues.

Reasoning

From `robomimic/config/base_config.py:163-164`:

# num workers for loading data - generally set to 0 for low-dim datasets, and 2 for image datasets
self.train.num_data_workers = 0

Low-dimensional observations are typically a few hundred bytes per sample and are often fully cached in memory (via `hdf5_cache_mode="all"`), meaning each `__getitem__` call is just a numpy array slice — no I/O at all. Adding worker processes in this scenario only adds serialization overhead from the inter-process queue. Image observations require reading and potentially decompressing JPEG/PNG data from HDF5, where parallel I/O provides genuine throughput improvement.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment