Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Environment:ARISE Initiative Robomimic HDF5 Data Dependencies

From Leeroopedia
Knowledge Sources
Domains Infrastructure, Data_Management
Last Updated 2026-02-15 07:30 GMT

Overview

HDF5 data storage stack with h5py, numpy, and SWMR (Single-Writer-Multiple-Reader) support for efficient dataset loading and parallel data access.

Description

Robomimic stores all demonstration datasets in HDF5 format. The `h5py` library is the primary interface for reading and writing these files. Datasets contain observation trajectories, actions, rewards, and metadata organized by demonstration episodes (e.g., `data/demo_0`, `data/demo_1`). The framework supports SWMR mode for safe parallel access from multiple DataLoader workers, and three caching modes (`"all"`, `"low_dim"`, `None`) that trade memory for I/O speed. Additional utilities include `imageio` and `imageio-ffmpeg` for video rendering, and `matplotlib` for visualization.

Usage

Use this environment for all data-related operations: loading training datasets, creating train/validation splits via filter keys, extracting observations from simulation states, filtering datasets by size, and inspecting dataset contents. Every robomimic workflow that touches HDF5 files requires these dependencies.

System Requirements

Category Requirement Notes
OS Mac OS X or Linux Cross-platform via Python
RAM Sufficient for dataset caching `hdf5_cache_mode="all"` loads entire dataset into RAM
Disk Varies by dataset Low-dim datasets: ~100MB; Image datasets: ~10GB+

Dependencies

Python Packages

  • `h5py` (any recent version)
  • `numpy` >= 1.13.3
  • `psutil` (system resource monitoring)
  • `tqdm` (progress bars)
  • `termcolor` (colored terminal output)
  • `imageio` (video/image I/O)
  • `imageio-ffmpeg` (FFmpeg backend for video writing)
  • `matplotlib` (visualization)
  • `tensorboard` (training metrics logging)
  • `tensorboardX` (TensorBoard SummaryWriter)

Credentials

No credentials required for HDF5 data operations.

Quick Install

# All dependencies are installed automatically with robomimic
pip install robomimic

# Or install individually
pip install h5py numpy psutil tqdm termcolor imageio imageio-ffmpeg matplotlib tensorboard tensorboardX

Code Evidence

SWMR mode usage from `robomimic/utils/dataset.py:81-82`:

hdf5_use_swmr (bool): whether to use swmr feature when opening the hdf5 file. This ensures
    that multiple Dataset instances can all access the same hdf5 file without problems.

Cache mode documentation from `robomimic/config/base_config.py:166-170`:

# One of ["all", "low_dim", or None]. Set to "all" to cache entire hdf5 in memory - this is
# by far the fastest for data loading. Set to "low_dim" to cache all non-image data. Set
# to None to use no caching - in this case, every batch sample is retrieved via file i/o.
# You should almost never set this to None, even for large image datasets.
self.train.hdf5_cache_mode = "all"

Filter key mechanism from `robomimic/utils/file_utils.py:28-67`:

def create_hdf5_filter_key(hdf5_path, demo_keys, key_name):
    f = h5py.File(hdf5_path, "a")
    demos = sorted(list(f["data"].keys()))
    # store list of filtered keys under mask group
    k = "mask/{}".format(key_name)
    if k in f:
        del f[k]
    f[k] = np.array(demo_keys, dtype='S')
    f.close()

Common Errors

Error Message Cause Solution
`OSError: Unable to open file` HDF5 file path incorrect or file corrupted Verify file path; re-download dataset
`MemoryError` during caching Dataset too large for `hdf5_cache_mode="all"` Switch to `hdf5_cache_mode="low_dim"` or `None`
`RuntimeError: unable to open file (file locking)` Multiple processes writing to same HDF5 Enable `hdf5_use_swmr=True` for read-only access

Compatibility Notes

  • SWMR mode: Requires HDF5 >= 1.10. Enabled by default (`hdf5_use_swmr=True`) for safe multi-worker data loading.
  • Large image datasets: Use `hdf5_cache_mode="low_dim"` to avoid OOM. Never use `None` — even for large datasets, caching non-image data significantly improves loading speed.
  • Filter keys: Stored under `mask/` group in HDF5 files. Used for train/validation splits and dataset size filtering.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment