Principle:ARISE Initiative Robomimic Observation Initialization
| Knowledge Sources | |
|---|---|
| Domains | Robotics, Perception, Data_Processing |
| Last Updated | 2026-02-15 08:00 GMT |
Overview
A global registry initialization pattern that maps observation keys to their sensory modalities and configures default encoder architectures for multi-modal robot learning.
Description
Observation Initialization sets up the global observation processing infrastructure required before any data loading or model creation. In robot learning, observations come from multiple modalities: low-dimensional proprioceptive state (joint positions, velocities), RGB images from cameras, depth maps, and 3D scan data. Each modality requires different processing pipelines (e.g., images need CNN encoders while low-dim data uses MLPs).
This principle solves the problem of consistently routing observation keys to the correct processing pipeline across the entire framework. Without centralized initialization, each component would need to independently determine how to handle each observation type, leading to inconsistency and errors.
The initialization populates three global registries: OBS_KEYS_TO_MODALITIES (mapping keys like "robot0_eef_pos" to "low_dim" or "agentview_image" to "rgb"), OBS_MODALITIES_TO_KEYS (reverse mapping), and DEFAULT_ENCODER_KWARGS (default encoder network configurations per modality).
Usage
Use this principle immediately after configuration setup and before any dataset loading or model creation. It must be called once at the start of any training or evaluation workflow. For hierarchical algorithms (HBC, IRIS), the initialization handles multiple observation specification groups (planner, actor, value).
Theoretical Basis
The principle implements a modality-driven observation routing pattern:
# Abstract algorithm (not real implementation)
# Step 1: Parse config to extract which obs keys belong to which modality
obs_specs = config.observation.modalities # e.g., {"low_dim": ["robot0_eef_pos"], "rgb": ["agentview_image"]}
# Step 2: Register mappings globally
for modality, keys in obs_specs.items():
for key in keys:
GLOBAL_KEY_TO_MODALITY[key] = modality
# Step 3: Configure default encoders per modality
for modality in obs_specs:
DEFAULT_ENCODERS[modality] = config.observation.encoder[modality]
This enables downstream components (dataset, model, rollout) to query observation types by key name, ensuring consistent handling everywhere.