Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Implementation:Isaac sim IsaacGymEnvs Task Design Specification

From Leeroopedia
Knowledge Sources
Type Pattern Doc
Domains Design, Reinforcement_Learning
Last Updated 2026-02-15 00:00 GMT

Overview

Concrete pattern for writing a task design specification document that captures all MDP design decisions before implementing a custom IsaacGymEnvs RL environment.

Description

A Task Design Specification is the design artifact produced before writing task code. It enumerates every component of the MDP: the observation vector layout, action interpretation, reward formula, reset conditions, physics parameters, and required assets. This document serves as both a development guide and a reference for debugging and iteration.

The specification pattern is derived from reference implementations in the IsaacGymEnvs repository, particularly Cartpole (minimal example) and Ant (complex example).

Usage

Create a task design specification before implementing any new RL environment. Refer back to it during implementation to verify that all components are correctly coded.

Code Reference

Source Location

Design Specification Template

The following template captures all required design decisions for a new task:

Task Name: MyCustomTask
Physics Engine: PhysX | Flex

=== Dimensions ===
num_obs: <integer>         # Observation vector dimension
num_actions: <integer>     # Action vector dimension
num_envs: <integer>        # Default parallel environments

=== Observation Vector ===
Index Range | Component           | Dimension | Range        | Description
0..2        | object_position     | 3         | [-inf, inf]  | XYZ position of target object
3..5        | object_velocity     | 3         | [-inf, inf]  | Linear velocity of target
6..N        | joint_positions     | N-6       | [lo, hi]     | Robot joint angles (radians)
...

=== Action Vector ===
Index Range | Component           | Dimension | Range        | Interpretation
0..M        | joint_torques       | M         | [-1, 1]      | Normalized torques, scaled by max_effort

=== Reward Function ===
reward = w1 * reward_component_1
       + w2 * reward_component_2
       - w3 * penalty_component_1

Where:
  reward_component_1: description and formula
  reward_component_2: description and formula
  penalty_component_1: description and formula

=== Reset Conditions ===
reset = (condition_1) OR (condition_2) OR (progress_buf >= max_episode_length)

Where:
  condition_1: description (e.g., object falls below table)
  condition_2: description (e.g., robot joint limits exceeded)

=== Required Assets ===
- asset_1.urdf: description
- asset_2.urdf: description

Reference: Cartpole Design Specification

Field Value
Task Name Cartpole
Physics Engine PhysX
num_obs 4
num_actions 1
num_envs 512 (default)

Observation Vector

Index Component Range Description
0 cart_position [-3.0, 3.0] Horizontal position of the cart on the rail
1 cart_velocity [-inf, inf] Linear velocity of the cart
2 pole_angle [-pi, pi] Angle of the pole from vertical
3 pole_angular_velocity [-inf, inf] Angular velocity of the pole

Action Vector

Index Component Range Interpretation
0 cart_force [-1.0, 1.0] Horizontal force applied to the cart, scaled by max_push_effort (400.0)

Reward Function

# Cartpole reward: keep the pole upright and cart centered
reward = 1.0  # alive bonus each timestep
reward -= cart_position * cart_position * 0.01  # penalize cart displacement
reward -= pole_angle * pole_angle * 0.1  # penalize pole angle from vertical

Reset Conditions

reset = (abs(cart_position) > reset_dist) |     # cart too far from center
        (abs(pole_angle) > max_pole_angle) |     # pole fallen too far
        (progress_buf >= max_episode_length)      # episode timeout

Required Assets

  • cartpole.urdf: Cart-pole system with 1 prismatic joint (cart) and 1 revolute joint (pole)

Reference: Ant Design Specification

Field Value
Task Name Ant
Physics Engine PhysX
num_obs 60
num_actions 8
num_envs 2048 (default)

Observation Vector (60 dimensions)

Index Range Component Dim Description
0..12 DOF positions 13 Joint angles for all 8 DOFs plus torso orientation
13..25 DOF velocities 13 Joint angular velocities
26..28 Torso velocity 3 Linear velocity of the torso body
29..31 Torso angular velocity 3 Angular velocity of the torso body
32..35 Gravity projection 4 Gravity vector projected into torso frame
36..59 Foot contact forces 24 Contact sensor readings for each foot (6 values per foot x 4 feet)

Reward Function

reward = (velocity_toward_target * progress_weight     # forward progress
        + alive_bonus                                    # survival reward
        - energy_cost * energy_weight                    # penalize high torques
        - joints_at_limit_cost * limit_weight)           # penalize joint limits

I/O Contract

Inputs

Name Type Required Description
Task concept Text Yes High-level description of what the agent should learn
Robot type URDF/MJCF Yes Robot model with defined joints, bodies, and sensors
Objects URDF/MJCF No Additional objects in the scene (targets, obstacles)

Outputs

Name Type Description
Design specification Document Complete MDP specification following the template above
num_obs Integer Total observation vector dimension
num_actions Integer Total action vector dimension
Reward formula Mathematical expression Weighted combination of reward and penalty components
Reset conditions Boolean expression Conditions triggering episode termination

Related Pages

Principle:Isaac_sim_IsaacGymEnvs_Task_Requirements_Design

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment