Implementation:Isaac sim IsaacGymEnvs CommonPlayer Run

Sources	Domains	Last Updated
IsaacGymEnvs, common_player.py	Inference, Evaluation	2026-02-15 00:00 GMT

Overview

The CommonPlayer class provides the concrete inference rollout loop for evaluating trained policies in IsaacGymEnvs environments.

Description

CommonPlayer extends rl_games' PpoPlayerContinuous with IsaacGymEnvs-specific environment handling. Its run() method implements the main rollout loop: resetting environments, querying the policy network for actions, stepping the simulation, and collecting per-episode statistics.

Usage

Instantiated automatically by the Runner's player factory when play=True is set. Users interact with it indirectly through the test=True CLI flag.

Code Reference

Source Location: Repository: NVIDIA-Omniverse/IsaacGymEnvs, File: isaacgymenvs/learning/common_player.py (L37-196)

Import:

from isaacgymenvs.learning.common_player import CommonPlayer

Signature:

class CommonPlayer(players.PpoPlayerContinuous):
    def __init__(self, params):
        """Initialize player with config, build network, load checkpoint."""

    def run(self):  # L54-152, main rollout loop
        """Execute n_games episodes, collecting rewards and step counts."""

    def get_action(self, obs_dict, is_determenistic=False):  # L161-163
        """Query the policy network for an action given observations."""

    def obs_to_torch(self, obs):
        """Convert observation to torch tensor on the correct device."""

    def _build_net(self, config):
        """Build the neural network in eval mode."""

    def _env_reset_done(self):
        """Reset environments that have completed episodes."""

    def _build_net_config(self):
        """Construct network configuration from player config."""

    def _setup_action_space(self):
        """Configure action space bounds and dimensions."""

I/O Contract

Key Parameters:

Parameter	Type	Default	Description
n_games	int	(from config)	Number of complete episodes to run before stopping
max_steps	int	(from config)	Maximum steps per episode (safety termination bound)
is_determenistic	bool	False	If True, use greedy actions; if False, sample from distribution

Inputs:

Input	Type	Description
Loaded policy model	nn.Module	Trained network with weights loaded from checkpoint, in eval mode
Environment instance	VecTask	Vectorized Isaac Gym environment with GPU tensor observations

Outputs:

Output	Type	Description
Per-episode rewards	list[float]	Total accumulated reward for each completed episode
Per-episode step counts	list[int]	Number of steps in each completed episode
Rendered visualization	(optional)	Real-time Isaac Gym viewer rendering if enabled

Rollout Loop Detail

The run() method (L54-152) implements the following sequence:

def run(self):
    n_games = self.games_num
    sum_rewards = 0
    sum_steps = 0
    games_played = 0

    obs = self.env_reset()

    for n in range(self.max_steps):
        # Query policy for actions
        action = self.get_action(obs, is_determenistic=self.is_determenistic)

        # Step environment
        obs, reward, done, info = self.env_step(action)

        # Accumulate per-environment reward
        self.cr += reward
        self.steps += 1

        # Process completed episodes
        done_indices = done.nonzero(as_tuple=False).squeeze(-1)
        if len(done_indices) > 0:
            for idx in done_indices:
                games_played += 1
                sum_rewards += self.cr[idx].item()
                sum_steps += self.steps[idx].item()

            self._env_reset_done()

        if games_played >= n_games:
            break

    mean_reward = sum_rewards / games_played
    mean_steps = sum_steps / games_played
    # Print summary statistics

Related Pages

Page Connections

Double-click a node to navigate. Hold to expand connections.

Principle

Implementation

Heuristic

Environment