Implementation:Isaac sim IsaacGymEnvs CommonPlayer Run
| Sources | Domains | Last Updated |
|---|---|---|
| IsaacGymEnvs, common_player.py | Inference, Evaluation | 2026-02-15 00:00 GMT |
Overview
The CommonPlayer class provides the concrete inference rollout loop for evaluating trained policies in IsaacGymEnvs environments.
Description
CommonPlayer extends rl_games' PpoPlayerContinuous with IsaacGymEnvs-specific environment handling. Its run() method implements the main rollout loop: resetting environments, querying the policy network for actions, stepping the simulation, and collecting per-episode statistics.
Usage
Instantiated automatically by the Runner's player factory when play=True is set. Users interact with it indirectly through the test=True CLI flag.
Code Reference
Source Location: Repository: NVIDIA-Omniverse/IsaacGymEnvs, File: isaacgymenvs/learning/common_player.py (L37-196)
Import:
from isaacgymenvs.learning.common_player import CommonPlayer
Signature:
class CommonPlayer(players.PpoPlayerContinuous):
def __init__(self, params):
"""Initialize player with config, build network, load checkpoint."""
def run(self): # L54-152, main rollout loop
"""Execute n_games episodes, collecting rewards and step counts."""
def get_action(self, obs_dict, is_determenistic=False): # L161-163
"""Query the policy network for an action given observations."""
def obs_to_torch(self, obs):
"""Convert observation to torch tensor on the correct device."""
def _build_net(self, config):
"""Build the neural network in eval mode."""
def _env_reset_done(self):
"""Reset environments that have completed episodes."""
def _build_net_config(self):
"""Construct network configuration from player config."""
def _setup_action_space(self):
"""Configure action space bounds and dimensions."""
I/O Contract
Key Parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
| n_games | int | (from config) | Number of complete episodes to run before stopping |
| max_steps | int | (from config) | Maximum steps per episode (safety termination bound) |
| is_determenistic | bool | False | If True, use greedy actions; if False, sample from distribution |
Inputs:
| Input | Type | Description |
|---|---|---|
| Loaded policy model | nn.Module | Trained network with weights loaded from checkpoint, in eval mode |
| Environment instance | VecTask | Vectorized Isaac Gym environment with GPU tensor observations |
Outputs:
| Output | Type | Description |
|---|---|---|
| Per-episode rewards | list[float] | Total accumulated reward for each completed episode |
| Per-episode step counts | list[int] | Number of steps in each completed episode |
| Rendered visualization | (optional) | Real-time Isaac Gym viewer rendering if enabled |
Rollout Loop Detail
The run() method (L54-152) implements the following sequence:
def run(self):
n_games = self.games_num
sum_rewards = 0
sum_steps = 0
games_played = 0
obs = self.env_reset()
for n in range(self.max_steps):
# Query policy for actions
action = self.get_action(obs, is_determenistic=self.is_determenistic)
# Step environment
obs, reward, done, info = self.env_step(action)
# Accumulate per-environment reward
self.cr += reward
self.steps += 1
# Process completed episodes
done_indices = done.nonzero(as_tuple=False).squeeze(-1)
if len(done_indices) > 0:
for idx in done_indices:
games_played += 1
sum_rewards += self.cr[idx].item()
sum_steps += self.steps[idx].item()
self._env_reset_done()
if games_played >= n_games:
break
mean_reward = sum_rewards / games_played
mean_steps = sum_steps / games_played
# Print summary statistics