Environment:ARISE Initiative Robomimic HDF5 Data Dependencies

Knowledge Sources	robomimic h5py
Domains	Infrastructure, Data_Management
Last Updated	2026-02-15 07:30 GMT

Overview

HDF5 data storage stack with h5py, numpy, and SWMR (Single-Writer-Multiple-Reader) support for efficient dataset loading and parallel data access.

Description

Robomimic stores all demonstration datasets in HDF5 format. The `h5py` library is the primary interface for reading and writing these files. Datasets contain observation trajectories, actions, rewards, and metadata organized by demonstration episodes (e.g., `data/demo_0`, `data/demo_1`). The framework supports SWMR mode for safe parallel access from multiple DataLoader workers, and three caching modes (`"all"`, `"low_dim"`, `None`) that trade memory for I/O speed. Additional utilities include `imageio` and `imageio-ffmpeg` for video rendering, and `matplotlib` for visualization.

Usage

Use this environment for all data-related operations: loading training datasets, creating train/validation splits via filter keys, extracting observations from simulation states, filtering datasets by size, and inspecting dataset contents. Every robomimic workflow that touches HDF5 files requires these dependencies.

System Requirements

Category	Requirement	Notes
OS	Mac OS X or Linux	Cross-platform via Python
RAM	Sufficient for dataset caching	`hdf5_cache_mode="all"` loads entire dataset into RAM
Disk	Varies by dataset	Low-dim datasets: ~100MB; Image datasets: ~10GB+

Dependencies

Python Packages

`h5py` (any recent version)
`numpy` >= 1.13.3
`psutil` (system resource monitoring)
`tqdm` (progress bars)
`termcolor` (colored terminal output)
`imageio` (video/image I/O)
`imageio-ffmpeg` (FFmpeg backend for video writing)
`matplotlib` (visualization)
`tensorboard` (training metrics logging)
`tensorboardX` (TensorBoard SummaryWriter)

Credentials

No credentials required for HDF5 data operations.

Quick Install

# All dependencies are installed automatically with robomimic
pip install robomimic

# Or install individually
pip install h5py numpy psutil tqdm termcolor imageio imageio-ffmpeg matplotlib tensorboard tensorboardX

Code Evidence

SWMR mode usage from `robomimic/utils/dataset.py:81-82`:

hdf5_use_swmr (bool): whether to use swmr feature when opening the hdf5 file. This ensures
    that multiple Dataset instances can all access the same hdf5 file without problems.

Cache mode documentation from `robomimic/config/base_config.py:166-170`:

# One of ["all", "low_dim", or None]. Set to "all" to cache entire hdf5 in memory - this is
# by far the fastest for data loading. Set to "low_dim" to cache all non-image data. Set
# to None to use no caching - in this case, every batch sample is retrieved via file i/o.
# You should almost never set this to None, even for large image datasets.
self.train.hdf5_cache_mode = "all"

Filter key mechanism from `robomimic/utils/file_utils.py:28-67`:

def create_hdf5_filter_key(hdf5_path, demo_keys, key_name):
    f = h5py.File(hdf5_path, "a")
    demos = sorted(list(f["data"].keys()))
    # store list of filtered keys under mask group
    k = "mask/{}".format(key_name)
    if k in f:
        del f[k]
    f[k] = np.array(demo_keys, dtype='S')
    f.close()

Common Errors

Error Message	Cause	Solution
`OSError: Unable to open file`	HDF5 file path incorrect or file corrupted	Verify file path; re-download dataset
`MemoryError` during caching	Dataset too large for `hdf5_cache_mode="all"`	Switch to `hdf5_cache_mode="low_dim"` or `None`
`RuntimeError: unable to open file (file locking)`	Multiple processes writing to same HDF5	Enable `hdf5_use_swmr=True` for read-only access

Compatibility Notes

SWMR mode: Requires HDF5 >= 1.10. Enabled by default (`hdf5_use_swmr=True`) for safe multi-worker data loading.
Large image datasets: Use `hdf5_cache_mode="low_dim"` to avoid OOM. Never use `None` — even for large datasets, caching non-image data significantly improves loading speed.
Filter keys: Stored under `mask/` group in HDF5 files. Used for train/validation splits and dataset size filtering.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment