Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Haosulab ManiSkill Robot Agent Definition

From Leeroopedia
Knowledge Sources
Domains Robotics, Simulation, Agent_Architecture
Last Updated 2026-02-15 08:00 GMT

Overview

A robot agent definition is a declarative specification that binds a physical robot model to its control interface, default poses, and sensor configuration so that a simulation environment can instantiate and operate the robot without knowledge of its mechanical details.

Description

In robotics simulation, the environment must interact with many different robots -- manipulators, mobile bases, humanoids, quadrupeds, dexterous hands -- through a uniform interface. The Robot Agent Definition principle addresses this by establishing a base class contract that every robot must fulfill. Each agent declares: (1) the path to its kinematic and dynamic model (URDF or MJCF), (2) the set of available controller configurations mapping human-readable control mode names to controller class/parameter pairs, (3) one or more keyframes representing canonical poses (e.g., "rest" or "home" configurations with joint positions and base placement), and (4) optional sensor configurations for cameras or tactile sensors mounted on the robot body.

This principle solves the problem of robot interchangeability. A task environment written for "a gripper robot" can accept any agent that provides the expected interface -- a Panda, a Fetch, an xArm, or a WidowX -- without modification. The agent definition layer also handles practical concerns such as automatic asset downloading when a model file is not found locally, gravity compensation on robot links, mimic joint controllers for coupled finger joints, and proprioceptive state extraction (joint positions, velocities, and TCP pose).

The pattern follows the Template Method design: the base class defines the lifecycle (load, reset, step, get_proprioception) while each concrete robot subclass fills in the declarative attributes and optionally overrides helper methods like is_grasping or is_static.

Usage

This principle applies whenever a new physical robot needs to be introduced into the simulation. It is the correct design choice when:

  • A new manipulator, hand, humanoid, or quadruped must be made available to existing task environments without modifying those environments.
  • Multiple control modes (joint position, joint velocity, end-effector pose, base velocity) need to be offered for the same robot through a single configuration dictionary.
  • Predefined keyframe poses are needed for deterministic episode resets or motion planning start configurations.
  • On-body sensors (wrist cameras, tactile arrays) must be declared as part of the robot specification rather than the environment specification.

Theoretical Basis

The Robot Agent Definition follows a declarative specification plus lifecycle protocol pattern:

1. Model Declaration: Each agent specifies a file path to a URDF (Unified Robot Description Format) or MJCF (MuJoCo XML Format) file. This file encodes the kinematic tree (links, joints, joint limits), dynamic parameters (masses, inertias), collision geometry, and visual meshes. The simulation engine parses this file to instantiate a fully articulated rigid-body system.

2. Controller Configuration Map: The agent provides a dictionary mapping string names to controller configurations. Each configuration specifies the controller class (e.g., PD joint position, PD end-effector pose, PD base velocity), the joints it controls, PD gains (stiffness and damping), force limits, action scaling, and normalization bounds. Multiple configurations can be composed for different joint groups (arm, gripper, base) and combined into named control modes.

3. Keyframe Specification: A keyframe is a tuple of (base_pose, joint_positions, joint_velocities) representing a known-good robot state. Keyframes serve as reset targets and motion planning start states. They decouple the notion of a "home position" from the task environment.

4. Sensor Declaration: Agents can declare sensor configurations (camera intrinsics, mount links, depth parameters) that travel with the robot. This ensures that robot-mounted sensors are consistently configured regardless of which task environment the robot is placed in.

5. Lifecycle Protocol:

  • Load: Parse the model file, build the articulation, instantiate controllers, apply material properties.
  • Reset: Set joints to a keyframe configuration, zero velocities, reset controller state.
  • Step: Accept an action vector, delegate to the active controller, apply drive targets.
  • Observe: Return proprioceptive state (joint positions, velocities, end-effector pose).

This design enables O(1) integration of new robots: define the class attributes, and the robot is immediately usable in any compatible environment.

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment