Overview
Concrete pattern for writing a task design specification document that captures all MDP design decisions before implementing a custom IsaacGymEnvs RL environment.
Description
A Task Design Specification is the design artifact produced before writing task code. It enumerates every component of the MDP: the observation vector layout, action interpretation, reward formula, reset conditions, physics parameters, and required assets. This document serves as both a development guide and a reference for debugging and iteration.
The specification pattern is derived from reference implementations in the IsaacGymEnvs repository, particularly Cartpole (minimal example) and Ant (complex example).
Usage
Create a task design specification before implementing any new RL environment. Refer back to it during implementation to verify that all components are correctly coded.
Code Reference
Source Location
Design Specification Template
The following template captures all required design decisions for a new task:
Task Name: MyCustomTask
Physics Engine: PhysX | Flex
=== Dimensions ===
num_obs: <integer> # Observation vector dimension
num_actions: <integer> # Action vector dimension
num_envs: <integer> # Default parallel environments
=== Observation Vector ===
Index Range | Component | Dimension | Range | Description
0..2 | object_position | 3 | [-inf, inf] | XYZ position of target object
3..5 | object_velocity | 3 | [-inf, inf] | Linear velocity of target
6..N | joint_positions | N-6 | [lo, hi] | Robot joint angles (radians)
...
=== Action Vector ===
Index Range | Component | Dimension | Range | Interpretation
0..M | joint_torques | M | [-1, 1] | Normalized torques, scaled by max_effort
=== Reward Function ===
reward = w1 * reward_component_1
+ w2 * reward_component_2
- w3 * penalty_component_1
Where:
reward_component_1: description and formula
reward_component_2: description and formula
penalty_component_1: description and formula
=== Reset Conditions ===
reset = (condition_1) OR (condition_2) OR (progress_buf >= max_episode_length)
Where:
condition_1: description (e.g., object falls below table)
condition_2: description (e.g., robot joint limits exceeded)
=== Required Assets ===
- asset_1.urdf: description
- asset_2.urdf: description
Reference: Cartpole Design Specification
| Field |
Value
|
| Task Name |
Cartpole
|
| Physics Engine |
PhysX
|
| num_obs |
4
|
| num_actions |
1
|
| num_envs |
512 (default)
|
Observation Vector
| Index |
Component |
Range |
Description
|
| 0 |
cart_position |
[-3.0, 3.0] |
Horizontal position of the cart on the rail
|
| 1 |
cart_velocity |
[-inf, inf] |
Linear velocity of the cart
|
| 2 |
pole_angle |
[-pi, pi] |
Angle of the pole from vertical
|
| 3 |
pole_angular_velocity |
[-inf, inf] |
Angular velocity of the pole
|
Action Vector
| Index |
Component |
Range |
Interpretation
|
| 0 |
cart_force |
[-1.0, 1.0] |
Horizontal force applied to the cart, scaled by max_push_effort (400.0)
|
Reward Function
# Cartpole reward: keep the pole upright and cart centered
reward = 1.0 # alive bonus each timestep
reward -= cart_position * cart_position * 0.01 # penalize cart displacement
reward -= pole_angle * pole_angle * 0.1 # penalize pole angle from vertical
Reset Conditions
reset = (abs(cart_position) > reset_dist) | # cart too far from center
(abs(pole_angle) > max_pole_angle) | # pole fallen too far
(progress_buf >= max_episode_length) # episode timeout
Required Assets
- cartpole.urdf: Cart-pole system with 1 prismatic joint (cart) and 1 revolute joint (pole)
Reference: Ant Design Specification
| Field |
Value
|
| Task Name |
Ant
|
| Physics Engine |
PhysX
|
| num_obs |
60
|
| num_actions |
8
|
| num_envs |
2048 (default)
|
Observation Vector (60 dimensions)
| Index Range |
Component |
Dim |
Description
|
| 0..12 |
DOF positions |
13 |
Joint angles for all 8 DOFs plus torso orientation
|
| 13..25 |
DOF velocities |
13 |
Joint angular velocities
|
| 26..28 |
Torso velocity |
3 |
Linear velocity of the torso body
|
| 29..31 |
Torso angular velocity |
3 |
Angular velocity of the torso body
|
| 32..35 |
Gravity projection |
4 |
Gravity vector projected into torso frame
|
| 36..59 |
Foot contact forces |
24 |
Contact sensor readings for each foot (6 values per foot x 4 feet)
|
Reward Function
reward = (velocity_toward_target * progress_weight # forward progress
+ alive_bonus # survival reward
- energy_cost * energy_weight # penalize high torques
- joints_at_limit_cost * limit_weight) # penalize joint limits
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| Task concept |
Text |
Yes |
High-level description of what the agent should learn
|
| Robot type |
URDF/MJCF |
Yes |
Robot model with defined joints, bodies, and sensors
|
| Objects |
URDF/MJCF |
No |
Additional objects in the scene (targets, obstacles)
|
Outputs
| Name |
Type |
Description
|
| Design specification |
Document |
Complete MDP specification following the template above
|
| num_obs |
Integer |
Total observation vector dimension
|
| num_actions |
Integer |
Total action vector dimension
|
| Reward formula |
Mathematical expression |
Weighted combination of reward and penalty components
|
| Reset conditions |
Boolean expression |
Conditions triggering episode termination
|
Related Pages
Principle:Isaac_sim_IsaacGymEnvs_Task_Requirements_Design
Page Connections
Double-click a node to navigate. Hold to expand connections.