Principle:ARISE Initiative Robomimic Observation Initialization

Knowledge Sources	robomimic robomimic Observations robomimic
Domains	Robotics, Perception, Data_Processing
Last Updated	2026-02-15 08:00 GMT

Overview

A global registry initialization pattern that maps observation keys to their sensory modalities and configures default encoder architectures for multi-modal robot learning.

Description

Observation Initialization sets up the global observation processing infrastructure required before any data loading or model creation. In robot learning, observations come from multiple modalities: low-dimensional proprioceptive state (joint positions, velocities), RGB images from cameras, depth maps, and 3D scan data. Each modality requires different processing pipelines (e.g., images need CNN encoders while low-dim data uses MLPs).

This principle solves the problem of consistently routing observation keys to the correct processing pipeline across the entire framework. Without centralized initialization, each component would need to independently determine how to handle each observation type, leading to inconsistency and errors.

The initialization populates three global registries: OBS_KEYS_TO_MODALITIES (mapping keys like "robot0_eef_pos" to "low_dim" or "agentview_image" to "rgb"), OBS_MODALITIES_TO_KEYS (reverse mapping), and DEFAULT_ENCODER_KWARGS (default encoder network configurations per modality).

Usage

Use this principle immediately after configuration setup and before any dataset loading or model creation. It must be called once at the start of any training or evaluation workflow. For hierarchical algorithms (HBC, IRIS), the initialization handles multiple observation specification groups (planner, actor, value).

Theoretical Basis

The principle implements a modality-driven observation routing pattern:

# Abstract algorithm (not real implementation)
# Step 1: Parse config to extract which obs keys belong to which modality
obs_specs = config.observation.modalities  # e.g., {"low_dim": ["robot0_eef_pos"], "rgb": ["agentview_image"]}

# Step 2: Register mappings globally
for modality, keys in obs_specs.items():
    for key in keys:
        GLOBAL_KEY_TO_MODALITY[key] = modality

# Step 3: Configure default encoders per modality
for modality in obs_specs:
    DEFAULT_ENCODERS[modality] = config.observation.encoder[modality]

This enables downstream components (dataset, model, rollout) to query observation types by key name, ensuring consistent handling everywhere.

Related Pages

Implemented By

Implementation:ARISE_Initiative_Robomimic_ObsUtils_initialize_obs_utils_with_config

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment