Principle:Isaac sim IsaacGymEnvs Checkpoint Export and Logging
| Field | Value |
|---|---|
| Principle Name | Checkpoint Export and Logging |
| Overview | Pattern for persisting trained model weights and tracking experiment metrics during reinforcement learning training. |
| Domains | Logging, Training |
| Related Implementation | Isaac_sim_IsaacGymEnvs_WandbAlgoObserver_Logging |
| Last Updated | 2026-02-15 00:00 GMT |
Description
During training, the system periodically saves model checkpoints (.pth files containing state_dict) and logs metrics (rewards, losses, FPS) to TensorBoard and optionally to Weights & Biases. Observers attached to the rl_games Runner intercept training events to collect and aggregate per-episode statistics.
The checkpointing and logging system operates at two levels:
- Agent-level checkpointing: The training agent saves model weights at configurable intervals (
save_frequency) to theruns/<experiment>/nn/directory. Each checkpoint contains the full modelstate_dict, optimizer state, and epoch number, enabling training resumption. - Observer-level logging:
AlgoObserverinstances attached to the Runner intercept training events and forward metrics to various backends.RLGPUAlgoObservercollects per-episode statistics from the GPU environment'sinfosdict, whileWandbAlgoObserverinitializes a Weights & Biases run and logs all metrics to the W&B dashboard.
Theoretical Basis
This system follows the Observer pattern, where AlgoObserver callbacks are invoked at key training events without modifying the training loop itself:
after_init(algo): Called once after the agent is fully initialized. Observers capture references to the agent's writer and other state.process_infos(infos, done_indices): Called after each environment step. The observer extracts per-episode statistics (e.g., consecutive successes, goal distance) from theinfosdictionary for completed episodes.after_print_stats(frame, epoch_num, total_time): Called after each epoch's statistics are printed. The observer writes aggregated metrics to TensorBoard or W&B.
This decoupled design means the training loop does not need to know which logging backends are active. New backends can be added by implementing the AlgoObserver interface and registering them in the MultiObserver.
Key properties of the checkpointing strategy:
- Periodic snapshots: Checkpoints are saved every
save_frequencyepochs to balance storage usage against recovery granularity. - Best-model tracking: The agent tracks the best mean reward seen so far and saves a dedicated
best.pthcheckpoint when a new high is achieved. - Resumable training: Checkpoints store optimizer state and epoch counter alongside model weights, enabling exact training resumption from any checkpoint.
When to Use
Use this principle when monitoring training progress and saving trained policies for later inference:
- When running long training experiments that may be interrupted and need to be resumed.
- When comparing training curves across different hyperparameter configurations.
- When deploying trained policies to real robots or downstream evaluation pipelines.
- When collaborating with a team that needs shared visibility into experiment results via W&B dashboards.
Structure
The logging and checkpointing pipeline consists of:
- Observer registration: At startup, observers are instantiated and wrapped in a
MultiObserverbefore being passed to the Runner. - Initialization hook: After the agent is built,
after_init()is called on each observer to capture references to the TensorBoard writer and agent state. - Per-step collection: During rollout,
process_infos()extracts episode-level metrics from the environment'sinfosdictionary for episodes that have terminated. - Per-epoch aggregation: After each training epoch,
after_print_stats()aggregates the collected per-episode metrics and writes them to TensorBoard and/or W&B. - Periodic checkpointing: The agent checks
epoch_num % save_frequency == 0and saves the modelstate_dict, optimizer state, and epoch number to a.pthfile.
Related Pages
- Isaac_sim_IsaacGymEnvs_WandbAlgoObserver_Logging - implements - Concrete observer implementations for W&B and GPU metrics logging.
- Isaac_sim_IsaacGymEnvs_RL_Agent_Initialization - prerequisite - Observers are registered during Runner initialization.
- Isaac_sim_IsaacGymEnvs_Policy_Training_Loop - related - The training loop invokes observer callbacks and triggers checkpoint saves.
Implementation:Isaac_sim_IsaacGymEnvs_WandbAlgoObserver_Logging