Principle:Isaac sim IsaacGymEnvs Task Requirements Design

Field	Value
Principle Name	Task Requirements Design
Overview	Design methodology for specifying the observation space, action space, reward function, and reset conditions of a new GPU-accelerated RL environment.
Domains	Design, Reinforcement_Learning
Related Implementation	Isaac_sim_IsaacGymEnvs_Task_Design_Specification
Last Updated	2026-02-15 00:00 GMT

Knowledge Sources	IsaacGymEnvs Isaac Gym Docs
Domains	Design, Reinforcement_Learning
Last Updated	2026-02-15 00:00 GMT

Description

Before implementing a custom task in IsaacGymEnvs, key design decisions must be made that define the Markov Decision Process (MDP) that the reinforcement learning agent will solve. These decisions include:

Observation space: What information will the agent receive at each timestep? This determines num_obs (the observation vector dimension).
Action space: What actions can the agent take? This determines num_actions (the action vector dimension).
Reward function: How will rewards be computed to shape the desired behavior? Reward components must incentivize the target task while penalizing undesired outcomes.
Reset conditions: When should episodes terminate and environments reset? Typical conditions include task success, task failure, and maximum episode length.
Asset requirements: What URDF/MJCF assets are needed to represent the robot and objects in the scene?
Physics engine choice: Whether to use PhysX (rigid body, GPU-accelerated) or Flex (soft body, particle-based).

These decisions collectively determine the task's interface with the training algorithm and the physics simulation engine.

Theoretical Basis

The design process maps directly to the formal MDP specification:

S (state space) maps to the observation vector. While the full physics state may be much larger, the observation vector contains the subset of state information the agent receives. Good observation design includes both proprioceptive information (joint positions, velocities) and task-relevant exteroceptive information (target positions, contact forces).
A (action space) maps to the action vector. Actions are typically continuous and interpreted as joint torques, joint position targets, or joint velocity targets depending on the control mode.
R (reward function) defines the reward signal at each timestep. Well-designed rewards combine multiple weighted components: a primary task reward (e.g., distance to goal), shaping rewards (e.g., velocity toward goal), and penalty terms (e.g., energy consumption, joint limit violations).
P (transition dynamics) are defined by the physics engine (PhysX or Flex). The designer chooses simulation parameters (timestep, substeps, solver iterations) that trade off accuracy for speed.
gamma (discount factor) is set in the training configuration, not the task design, but the effective episode horizon should be considered during design.

A key design principle is the separation of observation engineering from reward engineering. The observation space defines what information the agent can perceive, while the reward function defines what behavior is desired. These are orthogonal concerns: changing the reward should not require changing observations, and adding observations should not require changing the reward.

When to Use

Use this principle when:

Planning a new RL environment before writing any code.
Evaluating whether an existing task design is complete and well-specified.
Debugging a task where the agent fails to learn (the root cause is often in the task design, not the code).
Comparing design alternatives for observation or reward formulations.

Structure

A complete task design specification includes:

Observation vector components: List each component, its dimension, range, and meaning. For example, Cartpole has 4 components: cart position, cart velocity, pole angle, pole angular velocity.
Action vector components: List each action, its dimension, range, and physical interpretation. For example, Cartpole has 1 action: horizontal force on the cart.
Reward function: Mathematical formula with named components and weights. For example, Cartpole reward = alive_bonus - position_penalty - angle_penalty.
Reset conditions: Boolean conditions that trigger episode termination. For example, Cartpole resets when pole angle exceeds threshold or cart position exceeds bounds.
Environment parameters: num_envs, physics engine, simulation timestep, control frequency, maximum episode length.
Asset list: URDF/MJCF files needed, with joint and body specifications.

Reference Task	num_obs	num_actions	Physics Engine	Key Design Decisions
Cartpole	4	1	PhysX	Minimal observation (position + velocity), single force action, dense alive reward
Ant	60	8	PhysX	Full proprioception (DOF states + body velocities + projections), per-joint torques, multi-component reward (progress + alive - energy - joints_at_limit)
ShadowHand	211	20	PhysX or Flex	Rich observations including fingertip positions/forces, asymmetric actor-critic (different obs for policy vs. value function)

Related Pages

Isaac_sim_IsaacGymEnvs_Task_Design_Specification - implements - Concrete pattern for writing the design specification document with all required fields.
Isaac_sim_IsaacGymEnvs_VecTask_Subclass_Creation - next step - After designing the task, create the VecTask subclass that implements it.
Isaac_sim_IsaacGymEnvs_Task_Configuration_Files - next step - Task design parameters are encoded in Hydra YAML configuration files.

Implementation:Isaac_sim_IsaacGymEnvs_Task_Design_Specification

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment