Environment:ARISE Initiative Robomimic HDF5 Data Dependencies
| Knowledge Sources | |
|---|---|
| Domains | Infrastructure, Data_Management |
| Last Updated | 2026-02-15 07:30 GMT |
Overview
HDF5 data storage stack with h5py, numpy, and SWMR (Single-Writer-Multiple-Reader) support for efficient dataset loading and parallel data access.
Description
Robomimic stores all demonstration datasets in HDF5 format. The `h5py` library is the primary interface for reading and writing these files. Datasets contain observation trajectories, actions, rewards, and metadata organized by demonstration episodes (e.g., `data/demo_0`, `data/demo_1`). The framework supports SWMR mode for safe parallel access from multiple DataLoader workers, and three caching modes (`"all"`, `"low_dim"`, `None`) that trade memory for I/O speed. Additional utilities include `imageio` and `imageio-ffmpeg` for video rendering, and `matplotlib` for visualization.
Usage
Use this environment for all data-related operations: loading training datasets, creating train/validation splits via filter keys, extracting observations from simulation states, filtering datasets by size, and inspecting dataset contents. Every robomimic workflow that touches HDF5 files requires these dependencies.
System Requirements
| Category | Requirement | Notes |
|---|---|---|
| OS | Mac OS X or Linux | Cross-platform via Python |
| RAM | Sufficient for dataset caching | `hdf5_cache_mode="all"` loads entire dataset into RAM |
| Disk | Varies by dataset | Low-dim datasets: ~100MB; Image datasets: ~10GB+ |
Dependencies
Python Packages
- `h5py` (any recent version)
- `numpy` >= 1.13.3
- `psutil` (system resource monitoring)
- `tqdm` (progress bars)
- `termcolor` (colored terminal output)
- `imageio` (video/image I/O)
- `imageio-ffmpeg` (FFmpeg backend for video writing)
- `matplotlib` (visualization)
- `tensorboard` (training metrics logging)
- `tensorboardX` (TensorBoard SummaryWriter)
Credentials
No credentials required for HDF5 data operations.
Quick Install
# All dependencies are installed automatically with robomimic
pip install robomimic
# Or install individually
pip install h5py numpy psutil tqdm termcolor imageio imageio-ffmpeg matplotlib tensorboard tensorboardX
Code Evidence
SWMR mode usage from `robomimic/utils/dataset.py:81-82`:
hdf5_use_swmr (bool): whether to use swmr feature when opening the hdf5 file. This ensures
that multiple Dataset instances can all access the same hdf5 file without problems.
Cache mode documentation from `robomimic/config/base_config.py:166-170`:
# One of ["all", "low_dim", or None]. Set to "all" to cache entire hdf5 in memory - this is
# by far the fastest for data loading. Set to "low_dim" to cache all non-image data. Set
# to None to use no caching - in this case, every batch sample is retrieved via file i/o.
# You should almost never set this to None, even for large image datasets.
self.train.hdf5_cache_mode = "all"
Filter key mechanism from `robomimic/utils/file_utils.py:28-67`:
def create_hdf5_filter_key(hdf5_path, demo_keys, key_name):
f = h5py.File(hdf5_path, "a")
demos = sorted(list(f["data"].keys()))
# store list of filtered keys under mask group
k = "mask/{}".format(key_name)
if k in f:
del f[k]
f[k] = np.array(demo_keys, dtype='S')
f.close()
Common Errors
| Error Message | Cause | Solution |
|---|---|---|
| `OSError: Unable to open file` | HDF5 file path incorrect or file corrupted | Verify file path; re-download dataset |
| `MemoryError` during caching | Dataset too large for `hdf5_cache_mode="all"` | Switch to `hdf5_cache_mode="low_dim"` or `None` |
| `RuntimeError: unable to open file (file locking)` | Multiple processes writing to same HDF5 | Enable `hdf5_use_swmr=True` for read-only access |
Compatibility Notes
- SWMR mode: Requires HDF5 >= 1.10. Enabled by default (`hdf5_use_swmr=True`) for safe multi-worker data loading.
- Large image datasets: Use `hdf5_cache_mode="low_dim"` to avoid OOM. Never use `None` — even for large datasets, caching non-image data significantly improves loading speed.
- Filter keys: Stored under `mask/` group in HDF5 files. Used for train/validation splits and dataset size filtering.