Implementation:ARISE Initiative Robomimic Custom Observation Modality Extension
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Perception, Extensibility |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
Concrete tool for extending the robomimic observation framework with custom sensor modalities, encoder networks, and data augmentation randomizers.
Description
This example module demonstrates the full pattern for adding a new observation modality to robomimic. It defines three extensibility points: CustomImageModality (a subclass of Modality that registers a new "custom_image" type with custom processing/unprocessing functions), CustomImageEncoderCore (a subclass of EncoderCore that defines how to encode observations of the custom modality), and CustomImageRandomizer (a subclass of Randomizer that provides data augmentation by creating noisy copies of input images and averaging the outputs). It also shows how to override the default processor on an existing modality (ScanModality).
Usage
Use this pattern when adapting robomimic to a custom robot setup with non-standard sensors (e.g., tactile arrays, thermal cameras, custom depth sensors). Import and subclass the base classes, then register the new modality in your config's observation encoder and modality sections.
Code Reference
Source Location
- Repository: robomimic
- File: examples/add_new_modality.py
- Lines: 1-215
Signature
class CustomImageModality(Modality):
"""
Custom modality for single-frame images with raw shape (H, W) in range [0, 255].
"""
name = "custom_image"
@classmethod
def _default_obs_processor(cls, obs):
"""Normalize to [-1, 1] range."""
@classmethod
def _default_obs_unprocessor(cls, obs):
"""Reverse normalization back to [0, 255]."""
class CustomImageEncoderCore(EncoderCore):
"""
Custom encoder core for processing custom image modality observations.
"""
def __init__(self, input_shape, welcome_str):
"""
Args:
input_shape (tuple): shape of input, inferred automatically at runtime
welcome_str (str): arbitrary custom argument
"""
def output_shape(self, input_shape=None):
"""Returns output shape given input shape."""
def forward(self, inputs):
"""Forward pass through the encoder."""
class CustomImageRandomizer(Randomizer):
"""
Data augmentation randomizer that creates N noisy copies of each image
and pools outputs by averaging.
"""
def __init__(self, input_shape, num_rand=1, noise_scale=0.01):
"""
Args:
input_shape (tuple): shape of input (C, H, W)
num_rand (int): number of random copies per input
noise_scale (float): magnitude of uniform noise
"""
def forward_in(self, inputs):
"""Create N noisy copies, reshape to (B*N, C, H, W)."""
def forward_out(self, inputs):
"""Split (B*N, ...) -> (B, N, ...) and average across N."""
Import
from robomimic.models import EncoderCore, Randomizer
from robomimic.utils.obs_utils import Modality, ScanModality
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| input_shape | tuple | Yes | Shape of the observation tensor (without batch dimension) |
| welcome_str | str | Yes | Custom argument for CustomImageEncoderCore (arbitrary kwargs) |
| num_rand | int | No | Number of random augmented copies (default: 1) |
| noise_scale | float | No | Magnitude of uniform noise for augmentation (default: 0.01) |
Outputs
| Name | Type | Description |
|---|---|---|
| CustomImageModality._default_obs_processor | np.ndarray or torch.Tensor | Normalized observation in [-1, 1] range |
| CustomImageEncoderCore.forward | torch.Tensor | Encoded observation (same shape as input for pass-through) |
| CustomImageRandomizer.forward_in | torch.Tensor | Augmented copies reshaped to (B*N, C, H, W) |
| CustomImageRandomizer.forward_out | torch.Tensor | Pooled output averaged across N copies, shape (B, ...) |
Usage Examples
Defining a Custom Modality
from robomimic.utils.obs_utils import Modality
class CustomImageModality(Modality):
name = "custom_image"
@classmethod
def _default_obs_processor(cls, obs):
# Normalize from [0, 255] to [-1, 1]
return (obs / 255.0 - 0.5) * 2
@classmethod
def _default_obs_unprocessor(cls, obs):
# Reverse: from [-1, 1] back to [0, 255]
return ((obs / 2) + 0.5) * 255.0
Overriding an Existing Modality Processor
import numpy as np
import torch
from robomimic.utils.obs_utils import ScanModality
def custom_scan_processor(obs):
# Trim padded ends from scan data
return obs[1:-1]
def custom_scan_unprocessor(obs):
# Re-add padding
if isinstance(obs, np.ndarray):
return np.concatenate([np.zeros(1), obs, np.zeros(1)])
return torch.concat([torch.zeros(1), obs, torch.zeros(1)])
ScanModality.set_obs_processor(processor=custom_scan_processor)
ScanModality.set_obs_unprocessor(unprocessor=custom_scan_unprocessor)
Registering Custom Modality in Config
from robomimic.config.bc_config import BCConfig
config = BCConfig()
# Set custom encoder for the new modality
config.observation.encoder.custom_image.core_class = "CustomImageEncoderCore"
config.observation.encoder.custom_image.core_kwargs.welcome_str = "hi there!"
config.observation.encoder.custom_image.obs_randomizer_class = "CustomImageRandomizer"
config.observation.encoder.custom_image.obs_randomizer_kwargs.num_rand = 3
config.observation.encoder.custom_image.obs_randomizer_kwargs.noise_scale = 0.05
# Associate observation keys with the custom modality
config.observation.modalities.obs.custom_image = ["my_image1", "my_image2"]
config.observation.modalities.goal.custom_image = ["my_image2", "my_image3"]