Jump to content

Connect Leeroopedia MCP: Equip your AI agents to search best practices, build plans, verify code, diagnose failures, and look up hyperparameter defaults.

Principle:Isaac sim IsaacGymEnvs Checkpoint Export and Logging

From Leeroopedia
Field Value
Principle Name Checkpoint Export and Logging
Overview Pattern for persisting trained model weights and tracking experiment metrics during reinforcement learning training.
Domains Logging, Training
Related Implementation Isaac_sim_IsaacGymEnvs_WandbAlgoObserver_Logging
Last Updated 2026-02-15 00:00 GMT

Description

During training, the system periodically saves model checkpoints (.pth files containing state_dict) and logs metrics (rewards, losses, FPS) to TensorBoard and optionally to Weights & Biases. Observers attached to the rl_games Runner intercept training events to collect and aggregate per-episode statistics.

The checkpointing and logging system operates at two levels:

  1. Agent-level checkpointing: The training agent saves model weights at configurable intervals (save_frequency) to the runs/<experiment>/nn/ directory. Each checkpoint contains the full model state_dict, optimizer state, and epoch number, enabling training resumption.
  2. Observer-level logging: AlgoObserver instances attached to the Runner intercept training events and forward metrics to various backends. RLGPUAlgoObserver collects per-episode statistics from the GPU environment's infos dict, while WandbAlgoObserver initializes a Weights & Biases run and logs all metrics to the W&B dashboard.

Theoretical Basis

This system follows the Observer pattern, where AlgoObserver callbacks are invoked at key training events without modifying the training loop itself:

  • after_init(algo): Called once after the agent is fully initialized. Observers capture references to the agent's writer and other state.
  • process_infos(infos, done_indices): Called after each environment step. The observer extracts per-episode statistics (e.g., consecutive successes, goal distance) from the infos dictionary for completed episodes.
  • after_print_stats(frame, epoch_num, total_time): Called after each epoch's statistics are printed. The observer writes aggregated metrics to TensorBoard or W&B.

This decoupled design means the training loop does not need to know which logging backends are active. New backends can be added by implementing the AlgoObserver interface and registering them in the MultiObserver.

Key properties of the checkpointing strategy:

  • Periodic snapshots: Checkpoints are saved every save_frequency epochs to balance storage usage against recovery granularity.
  • Best-model tracking: The agent tracks the best mean reward seen so far and saves a dedicated best.pth checkpoint when a new high is achieved.
  • Resumable training: Checkpoints store optimizer state and epoch counter alongside model weights, enabling exact training resumption from any checkpoint.

When to Use

Use this principle when monitoring training progress and saving trained policies for later inference:

  • When running long training experiments that may be interrupted and need to be resumed.
  • When comparing training curves across different hyperparameter configurations.
  • When deploying trained policies to real robots or downstream evaluation pipelines.
  • When collaborating with a team that needs shared visibility into experiment results via W&B dashboards.

Structure

The logging and checkpointing pipeline consists of:

  1. Observer registration: At startup, observers are instantiated and wrapped in a MultiObserver before being passed to the Runner.
  2. Initialization hook: After the agent is built, after_init() is called on each observer to capture references to the TensorBoard writer and agent state.
  3. Per-step collection: During rollout, process_infos() extracts episode-level metrics from the environment's infos dictionary for episodes that have terminated.
  4. Per-epoch aggregation: After each training epoch, after_print_stats() aggregates the collected per-episode metrics and writes them to TensorBoard and/or W&B.
  5. Periodic checkpointing: The agent checks epoch_num % save_frequency == 0 and saves the model state_dict, optimizer state, and epoch number to a .pth file.

Related Pages

Implementation:Isaac_sim_IsaacGymEnvs_WandbAlgoObserver_Logging

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment