Implementation:ARISE Initiative Robomimic Custom Observation Modality Extension

Knowledge Sources	robomimic robomimic Observations
Domains	Robotics, Perception, Extensibility
Last Updated	2026-02-15 08:00 GMT

Overview

Concrete tool for extending the robomimic observation framework with custom sensor modalities, encoder networks, and data augmentation randomizers.

Description

This example module demonstrates the full pattern for adding a new observation modality to robomimic. It defines three extensibility points: CustomImageModality (a subclass of Modality that registers a new "custom_image" type with custom processing/unprocessing functions), CustomImageEncoderCore (a subclass of EncoderCore that defines how to encode observations of the custom modality), and CustomImageRandomizer (a subclass of Randomizer that provides data augmentation by creating noisy copies of input images and averaging the outputs). It also shows how to override the default processor on an existing modality (ScanModality).

Usage

Use this pattern when adapting robomimic to a custom robot setup with non-standard sensors (e.g., tactile arrays, thermal cameras, custom depth sensors). Import and subclass the base classes, then register the new modality in your config's observation encoder and modality sections.

Code Reference

Source Location

Repository: robomimic
File: examples/add_new_modality.py
Lines: 1-215

Signature

class CustomImageModality(Modality):
    """
    Custom modality for single-frame images with raw shape (H, W) in range [0, 255].
    """
    name = "custom_image"

    @classmethod
    def _default_obs_processor(cls, obs):
        """Normalize to [-1, 1] range."""

    @classmethod
    def _default_obs_unprocessor(cls, obs):
        """Reverse normalization back to [0, 255]."""


class CustomImageEncoderCore(EncoderCore):
    """
    Custom encoder core for processing custom image modality observations.
    """
    def __init__(self, input_shape, welcome_str):
        """
        Args:
            input_shape (tuple): shape of input, inferred automatically at runtime
            welcome_str (str): arbitrary custom argument
        """

    def output_shape(self, input_shape=None):
        """Returns output shape given input shape."""

    def forward(self, inputs):
        """Forward pass through the encoder."""


class CustomImageRandomizer(Randomizer):
    """
    Data augmentation randomizer that creates N noisy copies of each image
    and pools outputs by averaging.
    """
    def __init__(self, input_shape, num_rand=1, noise_scale=0.01):
        """
        Args:
            input_shape (tuple): shape of input (C, H, W)
            num_rand (int): number of random copies per input
            noise_scale (float): magnitude of uniform noise
        """

    def forward_in(self, inputs):
        """Create N noisy copies, reshape to (B*N, C, H, W)."""

    def forward_out(self, inputs):
        """Split (B*N, ...) -> (B, N, ...) and average across N."""

Import

from robomimic.models import EncoderCore, Randomizer
from robomimic.utils.obs_utils import Modality, ScanModality

I/O Contract

Inputs

Name	Type	Required	Description
input_shape	tuple	Yes	Shape of the observation tensor (without batch dimension)
welcome_str	str	Yes	Custom argument for CustomImageEncoderCore (arbitrary kwargs)
num_rand	int	No	Number of random augmented copies (default: 1)
noise_scale	float	No	Magnitude of uniform noise for augmentation (default: 0.01)

Outputs

Name	Type	Description
CustomImageModality._default_obs_processor	np.ndarray or torch.Tensor	Normalized observation in [-1, 1] range
CustomImageEncoderCore.forward	torch.Tensor	Encoded observation (same shape as input for pass-through)
CustomImageRandomizer.forward_in	torch.Tensor	Augmented copies reshaped to (B*N, C, H, W)
CustomImageRandomizer.forward_out	torch.Tensor	Pooled output averaged across N copies, shape (B, ...)

Usage Examples

Defining a Custom Modality

from robomimic.utils.obs_utils import Modality

class CustomImageModality(Modality):
    name = "custom_image"

    @classmethod
    def _default_obs_processor(cls, obs):
        # Normalize from [0, 255] to [-1, 1]
        return (obs / 255.0 - 0.5) * 2

    @classmethod
    def _default_obs_unprocessor(cls, obs):
        # Reverse: from [-1, 1] back to [0, 255]
        return ((obs / 2) + 0.5) * 255.0

Overriding an Existing Modality Processor

import numpy as np
import torch
from robomimic.utils.obs_utils import ScanModality

def custom_scan_processor(obs):
    # Trim padded ends from scan data
    return obs[1:-1]

def custom_scan_unprocessor(obs):
    # Re-add padding
    if isinstance(obs, np.ndarray):
        return np.concatenate([np.zeros(1), obs, np.zeros(1)])
    return torch.concat([torch.zeros(1), obs, torch.zeros(1)])

ScanModality.set_obs_processor(processor=custom_scan_processor)
ScanModality.set_obs_unprocessor(unprocessor=custom_scan_unprocessor)

Registering Custom Modality in Config

from robomimic.config.bc_config import BCConfig

config = BCConfig()

# Set custom encoder for the new modality
config.observation.encoder.custom_image.core_class = "CustomImageEncoderCore"
config.observation.encoder.custom_image.core_kwargs.welcome_str = "hi there!"
config.observation.encoder.custom_image.obs_randomizer_class = "CustomImageRandomizer"
config.observation.encoder.custom_image.obs_randomizer_kwargs.num_rand = 3
config.observation.encoder.custom_image.obs_randomizer_kwargs.noise_scale = 0.05

# Associate observation keys with the custom modality
config.observation.modalities.obs.custom_image = ["my_image1", "my_image2"]
config.observation.modalities.goal.custom_image = ["my_image2", "my_image3"]

Related Pages

Implements Principle

Principle:ARISE_Initiative_Robomimic_Custom_Observation_Modality_Extension

Related Implementations

Implementation:ARISE_Initiative_Robomimic_ObsUtils_initialize_obs_utils_with_config

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment