Principle:ARISE Initiative Robomimic Custom Observation Modality Extension

Knowledge Sources	robomimic robomimic Observations robomimic
Domains	Robotics, Perception, Extensibility
Last Updated	2026-02-15 08:00 GMT

Overview

A modular extension pattern that enables users to add custom sensor modalities, observation encoders, and data augmentation randomizers to the robomimic observation processing pipeline.

Description

Robot learning frameworks must handle diverse sensor types, from RGB cameras and depth sensors to tactile arrays and proprioceptive state. Custom Observation Modality Extension is the design pattern by which robomimic allows users to define new observation types without modifying the core framework code. It relies on three abstract base classes: Modality (defines processing/unprocessing for a sensor type), EncoderCore (defines the neural network encoder for that modality), and Randomizer (defines data augmentation strategies).

This principle solves the closed-world problem where a framework only supports a fixed set of sensor types. By providing subclass hooks at three levels (data preprocessing, neural encoding, and augmentation), new sensors can be integrated by following a well-defined interface contract. The config system then binds these custom classes to specific observation keys via string-based class resolution.

Usage

Use this principle when adapting robomimic to a robot setup with non-standard sensors (e.g., thermal cameras, tactile grids, lidar) or when the default processing for an existing modality (RGB, scan, low-dim) is insufficient for your data format. It is also the correct approach for implementing custom data augmentation strategies.

Theoretical Basis

The extension follows a three-layer modular architecture:

# Abstract algorithm (not real implementation)

# Layer 1: Modality Registration
# Define how raw sensor data is preprocessed for learning
class MyModality(Modality):
    name = "my_sensor"
    def process(obs) -> normalized_obs
    def unprocess(obs) -> raw_obs

# Layer 2: Encoder Network
# Define the neural network that encodes observations of this modality
class MyEncoder(EncoderCore):
    def __init__(input_shape, **kwargs)
    def output_shape(input_shape) -> shape
    def forward(inputs) -> encoded

# Layer 3: Data Augmentation (Optional)
# Define stochastic transformations applied during training
class MyRandomizer(Randomizer):
    def forward_in(inputs) -> augmented_inputs   # before encoder
    def forward_out(outputs) -> pooled_outputs    # after encoder

# Binding: Config links obs keys to modality + encoder + randomizer
config.observation.modalities.obs.my_sensor = ["sensor_key_1"]
config.observation.encoder.my_sensor.core_class = "MyEncoder"
config.observation.encoder.my_sensor.obs_randomizer_class = "MyRandomizer"

The key invariant is that the Modality defines data-level preprocessing (dtype, normalization), the EncoderCore defines the learned representation, and the Randomizer defines training-time stochastic augmentation. These three layers compose independently: any modality can pair with any encoder and any randomizer.

Related Pages

Implemented By

Implementation:ARISE_Initiative_Robomimic_Custom_Observation_Modality_Extension

Related Principles

Principle:ARISE_Initiative_Robomimic_Observation_Initialization

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment