Principle:Isaac sim IsaacGymEnvs Core Simulation Methods

Field	Value
Principle Name	Core Simulation Methods
Overview	Abstract interface defining the simulation loop methods that all GPU-accelerated RL environments must implement.
Domains	Simulation, Architecture
Related Implementation	Isaac_sim_IsaacGymEnvs_VecTask_Simulation_Loop
Last Updated	2026-02-15 00:00 GMT

Knowledge Sources	IsaacGymEnvs Isaac Gym Docs
Domains	Simulation, Architecture
Last Updated	2026-02-15 00:00 GMT

Description

The core simulation loop in IsaacGymEnvs follows a fixed three-phase pattern within the step() method:

pre_physics_step(actions): Translate RL actions into physics commands. The agent's output (a tensor of continuous values) is converted into forces, torques, or joint position/velocity targets that the physics engine can apply.
simulate(): The physics engine (PhysX or Flex) advances the simulation by one or more substeps. This is handled by VecTask and does not need to be overridden.
post_physics_step(): Read the new physics state and compute everything the RL algorithm needs: observations (what the agent perceives), rewards (how well it performed), and reset flags (whether episodes have ended).

Additional methods support the loop:

create_sim(): One-time initialization of the physics world, ground plane, assets, and environment instances. Called during __init__.
reset_idx(env_ids): Selectively reset environments whose episodes have terminated, randomizing their initial states for the next episode.
compute_observations(): Fill the observation buffer with state information derived from the physics simulation.
compute_reward(): Compute scalar reward values and determine which environments should reset.
allocate_buffers(): Allocate the GPU tensor buffers (obs_buf, rew_buf, reset_buf, progress_buf) used throughout the loop.

Theoretical Basis

The simulation loop implements a contract between the base class (VecTask) and its subclasses:

Simulation Loop Contract:

1. pre_physics_step(actions):
   Input:  actions tensor [num_envs, num_actions] from RL policy
   Effect: Apply forces/targets to simulation actors
   Timing: Before physics integration

2. simulate():
   Effect: Physics engine integrates equations of motion for dt * num_substeps
   Timing: After actions applied, before state read
   Note:   Runs entirely on GPU, not overridden by subclasses

3. post_physics_step():
   Effect: Read new state, compute obs_buf, rew_buf, reset_buf
   Timing: After physics integration
   Calls:  compute_observations(), compute_reward(), reset_idx()

This separation serves critical purposes:

GPU pipeline efficiency: Actions are batched and applied to all environments simultaneously. Physics runs as a single GPU kernel. Observations and rewards are computed in parallel across all environments.
Deterministic ordering: The fixed sequence ensures that observations always reflect the state after actions have been applied and physics has been simulated.
Clear responsibility boundaries: Each method has a single, well-defined role, making debugging straightforward. If rewards are wrong, check compute_reward(). If the robot does not move, check pre_physics_step().

When to Use

Use this principle when:

Implementing the core loop methods of any new IsaacGymEnvs task.
Debugging issues with simulation behavior (identify which phase of the loop is responsible).
Understanding the execution flow of an existing task.
Deciding where to place custom logic (action processing vs. state computation vs. reward calculation).

Structure

The full execution flow within a single step() call:

step(actions):
  |
  +-- pre_physics_step(actions)
  |     +-- Scale/transform actions
  |     +-- Apply to simulation (set_dof_actuation_force_tensor, etc.)
  |
  +-- for substep in range(control_freq_inv):
  |     +-- gym.simulate(sim)            # physics integration
  |     +-- gym.fetch_results(sim, True) # sync results
  |
  +-- post_physics_step()
  |     +-- progress_buf += 1
  |     +-- gym.refresh_*_tensor(sim)    # refresh GPU state tensors
  |     +-- compute_observations()       # fill obs_buf
  |     +-- compute_reward()             # fill rew_buf, reset_buf
  |     +-- reset_idx(reset_env_ids)     # reset terminated episodes
  |
  +-- return obs_buf, rew_buf, reset_buf, extras

Key Design Decisions

Decision	Options	Guidance
Action interpretation	Forces, position targets, velocity targets	Forces give direct control but are harder to learn. Position targets are easier for articulated robots.
Observation refresh	Before or after reset	Typically refresh tensors, compute obs, then reset. Reset environments get fresh obs in the next step.
Reward timing	Before or after reset check	Compute reward before checking resets so that terminal rewards are included.
Substep count	1-4 typical	More substeps increase physics accuracy but reduce training throughput.

Related Pages

Isaac_sim_IsaacGymEnvs_VecTask_Simulation_Loop - implements - Concrete API signatures and Cartpole reference implementation for each method.
Isaac_sim_IsaacGymEnvs_VecTask_Subclass_Creation - context - The overall class structure that contains these methods.
Isaac_sim_IsaacGymEnvs_Task_Requirements_Design - prerequisite - Design decisions that determine the content of each method.

Implementation:Isaac_sim_IsaacGymEnvs_VecTask_Simulation_Loop

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment