Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Workflow:Isaac sim IsaacGymEnvs Policy Inference and Evaluation

From Leeroopedia
Knowledge Sources
Domains Reinforcement_Learning, Robotics, Evaluation
Last Updated 2026-02-15 09:00 GMT

Overview

End-to-end process for loading a trained RL policy checkpoint and running inference to evaluate or visualize the learned behavior in Isaac Gym environments.

Description

This workflow covers how to take a previously-trained policy checkpoint and run it in test mode across Isaac Gym environments. The same train.py entry point is used with the test=True flag, which causes the rl_games Runner to create a Player instead of a training Agent. The Player executes the policy deterministically (or with configurable noise) and collects evaluation metrics. The workflow supports rendering the simulation for visual inspection, recording videos, and running headless for batch evaluation.

Key capabilities:

  • Load any .pth checkpoint and run inference on the corresponding task
  • Render the simulation with a viewer or capture videos programmatically
  • Run with reduced environment count for efficient visualization
  • Evaluate success rates and episode returns

Usage

Execute this workflow when you have a trained policy checkpoint (.pth file) and want to evaluate its performance, visually inspect learned behaviors, or capture demonstration videos. This is typically done after the RL Policy Training workflow has produced checkpoints. It is also used to test checkpoints produced by Population-Based Training.

Execution Steps

Step 1: Locate Trained Checkpoint

Identify the checkpoint file from a previous training run. Checkpoints are stored in runs/EXPERIMENT_NAME/nn/ and include the training iteration and reward value in the filename. For PBT experiments, the best checkpoint is in the workspace best directory.

Key considerations:

  • Checkpoint filenames may contain special characters (brackets, equals signs) that need shell escaping
  • The checkpoint must match the task and network architecture used during training
  • PBT best checkpoints are stored in pbt_workspace/best0/

Step 2: Configure Inference Run

Set the test=True flag and provide the checkpoint path. Optionally reduce num_envs for faster rendering. Choose whether to run with the viewer (default) or headlessly. Match any non-default training hyperparameters (such as network architecture overrides) that were used during training.

Key considerations:

  • All CLI overrides from training must be replicated for architecture-sensitive parameters
  • Use num_envs=64 or similar for manageable visualization
  • Set headless=True for batch evaluation without rendering overhead

Step 3: Environment and Player Initialization

The Runner creates the environment and instantiates a Player (inference-only agent) instead of a training Agent. The Player loads the policy network weights from the checkpoint, sets the policy to evaluation mode, and prepares to collect rollouts.

What happens:

  • The environment is created identically to training mode
  • The rl_games Runner detects test=True and creates a Player
  • Policy weights are loaded from the checkpoint file
  • The network is set to eval mode (no gradient computation, deterministic BatchNorm)

Step 4: Policy Rollout and Visualization

The Player runs the policy in a loop, stepping through the environment and optionally rendering frames. For each timestep, the policy network produces actions from observations, the environment steps forward, and results are displayed or recorded.

What happens:

  • At each step: observation → policy forward pass → action → environment step
  • The viewer displays the simulation in real-time if not headless
  • Video frames are optionally captured and saved to the videos/ directory
  • Episode statistics (returns, success rates) are accumulated

Step 5: Results Collection

After the evaluation run completes, aggregate metrics are reported. Success counts, average returns, and other task-specific metrics are printed. For tasks with true_objective metrics (like manipulation success rate), these are reported separately from the shaped reward.

Key considerations:

  • Some tasks report specific metrics like consecutiveSuccesses
  • TensorBoard eval_summaries may be generated for ADR checkpoints
  • Results can be compared across different checkpoints or training seeds

Execution Diagram

GitHub URL

Workflow Repository