Overview
Concrete entry points and commands for executing training, testing, and debugging of IsaacGymEnvs tasks through both the CLI and programmatic API.
Description
IsaacGymEnvs provides two entry points for running tasks: the CLI via train.py (isaacgymenvs/train.py:L71-215) and the programmatic API via isaacgymenvs.make() (isaacgymenvs/__init__.py:L14-55). Both entry points compose Hydra configurations, look up the task in isaacgym_task_map, instantiate the environment, and either launch rl_games training or return the environment for custom use.
Usage
Use the CLI for standard training and evaluation. Use the programmatic API for custom training loops, integration with other frameworks, or automated experimentation.
Code Reference
Source Location
Entry Point 1: CLI via train.py
Signature
python train.py task=<TaskName> [key=value overrides...]
Core Parameters
| Parameter |
Type |
Default |
Description
|
task |
string |
Required |
Task name (must match isaacgym_task_map key and cfg/task/{name}.yaml)
|
num_envs |
int |
From task YAML |
Number of parallel environments
|
headless |
bool |
False |
Disable rendering (True for server training)
|
seed |
int |
42 |
Random seed for reproducibility
|
max_iterations |
int |
From train YAML |
Maximum training epochs
|
sim_device |
string |
"cuda:0" |
Device for physics simulation
|
rl_device |
string |
"cuda:0" |
Device for RL computations
|
test |
bool |
False |
Run in evaluation mode (no training)
|
checkpoint |
string |
"" |
Path to checkpoint for resuming or testing
|
experiment |
string |
"" |
Experiment name for logging
|
multi_gpu |
bool |
False |
Enable multi-GPU training
|
Entry Point 2: Programmatic API
Signature
import isaacgymenvs
env = isaacgymenvs.make(
seed=0,
task="MyTask",
num_envs=64,
sim_device="cuda:0",
rl_device="cuda:0",
graphics_device_id=0,
headless=False,
multi_gpu=False,
virtual_screen_capture=False,
force_render=False,
)
API Usage Example
import isaacgymenvs
import torch
# Create environment
env = isaacgymenvs.make(
seed=42,
task="Cartpole",
num_envs=64,
sim_device="cuda:0",
rl_device="cuda:0",
)
# Reset and run
obs = env.reset()
for step in range(1000):
# Random actions for testing
actions = torch.randn(env.num_envs, env.num_actions, device=env.rl_device)
obs, rewards, dones, info = env.step(actions)
print(f"Step {step}: mean_reward={rewards.mean().item():.3f}, "
f"num_resets={dones.sum().item()}")
Debugging Commands
Stage 1: Visual Verification
# Run with few environments and rendering for visual inspection
python train.py task=MyTask num_envs=4 headless=False max_iterations=5
# What to look for:
# - Assets appear correctly in the viewer
# - Physics is stable (no explosions, no interpenetration)
# - Environments reset properly
# - Use keyboard: V to toggle viewer, R to reset, Esc to quit
Stage 2: Quick Training Sanity Check
# Short training run to verify reward signals
python train.py task=MyTask num_envs=256 max_iterations=50 headless=True
# What to look for in output:
# - rewards/step should not be constant
# - rewards/step should show some variation or improvement
# - No NaN or Inf errors
Stage 3: Moderate Training Run
# Medium-scale training to verify learning
python train.py task=MyTask num_envs=1024 max_iterations=200 headless=True
# Monitor with TensorBoard:
# tensorboard --logdir runs/MyTask/summaries
Stage 4: Full-Scale Training
# Full training with default parameters from YAML
python train.py task=MyTask headless=True
# Outputs saved to:
# - runs/MyTask/nn/MyTask.pth (best checkpoint)
# - runs/MyTask/nn/last_MyTask.pth (last checkpoint)
# - runs/MyTask/summaries/ (TensorBoard logs)
Stage 5: Evaluation
# Evaluate trained policy with rendering
python train.py task=MyTask test=True \
checkpoint=runs/MyTask/nn/MyTask.pth \
num_envs=16 headless=False
# What to look for:
# - Agent exhibits desired behavior
# - No reward exploitation or unexpected strategies
Key Debugging Parameters
| Parameter |
Purpose |
Recommended Value for Debugging
|
num_envs |
Control parallelism |
Start with 4-16 for visual debugging, scale to 256+ for training
|
headless=False |
Enable viewer |
Use for visual verification and policy evaluation
|
max_iterations |
Limit training duration |
Use 5-10 for physics testing, 50-100 for reward checking
|
seed |
Reproducibility |
Fix seed when comparing configurations
|
test=True |
Evaluation mode |
Use with checkpoint to evaluate trained policy
|
checkpoint |
Resume or evaluate |
Path to saved .pth checkpoint file
|
What to Monitor During Training
| Metric |
Location |
Healthy Behavior |
Warning Signs
|
| Mean reward |
TensorBoard: rewards/step |
Increasing trend |
Flat, decreasing, or NaN
|
| Episode length |
TensorBoard: episode_lengths/step |
Increasing (survival) or decreasing (goal-reaching) |
Constant at max_episode_length (reward too sparse)
|
| Policy entropy |
TensorBoard: entropy/step |
Gradual decrease |
Rapid collapse to 0 (premature convergence)
|
| Value loss |
TensorBoard: losses/value_loss |
Decreasing |
Increasing or oscillating wildly
|
| Learning rate |
TensorBoard: info/learning_rate |
Stable or gradually decreasing (adaptive) |
Rapid oscillation
|
I/O Contract
Inputs
| Name |
Type |
Required |
Description
|
| Registered task class |
isaacgym_task_map entry |
Yes |
Task must be registered in the task map
|
| Task YAML config |
cfg/task/MyTask.yaml |
Yes |
Environment parameters
|
| Train YAML config |
cfg/train/MyTaskPPO.yaml |
Yes |
Training hyperparameters
|
| CLI overrides |
key=value pairs |
No |
Hydra overrides for any configuration parameter
|
Outputs
| Name |
Type |
Description
|
| Checkpoints |
.pth files |
Saved policy network weights in runs/{TaskName}/nn/
|
| TensorBoard logs |
event files |
Training metrics in runs/{TaskName}/summaries/
|
| Visual verification |
Isaac Gym viewer |
Real-time rendering when headless=False
|
| Console output |
Text |
Per-epoch reward, episode length, and timing statistics
|
Related Pages
Principle:Isaac_sim_IsaacGymEnvs_Task_Testing_Iteration