Jump to content

Connect SuperML | Leeroopedia MCP: Equip your AI agents with best practices, code verification, and debugging knowledge. Powered by Leeroo — building Organizational Superintelligence. Contact us at founders@leeroo.com.

Principle:Norrrrrrr lyn WAInjectBench Checkpoint Export

From Leeroopedia
Knowledge Sources
Domains Model_Management, Deep_Learning
Last Updated 2026-02-14 16:00 GMT

Overview

A model persistence strategy that serializes the complete training state (model weights, optimizer state, training metadata) to a checkpoint file for resumable training and deployment.

Description

Checkpoint Export saves a comprehensive snapshot of the training state using torch.save. The saved dictionary includes:

  • model_state: The full model state dict (including LoRA weights)
  • optimizer_state: Optimizer momentum/variance buffers for training resumption
  • epoch: The epoch number when this checkpoint was saved
  • best_tpr: The validation TPR that triggered saving
  • amp_enabled/amp_dtype: AMP configuration for reproducible inference

The checkpoint filename encodes the epoch and TPR for easy identification of the best model.

Usage

Use this when a new best TPR is achieved during validation. The saved checkpoint can be loaded for inference by the detector_image/llava.py module or for continued training.

Theoretical Basis

# Checkpoint save pattern
checkpoint = {
    "epoch": current_epoch,
    "model_state": model.state_dict(),
    "optimizer_state": optimizer.state_dict(),
    "best_metric": best_metric_value,
    "training_config": config_dict,
}
torch.save(checkpoint, path)
# Load: checkpoint = torch.load(path); model.load_state_dict(checkpoint["model_state"])

Related Pages

Implemented By

Page Connections

Double-click a node to navigate. Hold to expand connections.
Principle
Implementation
Heuristic
Environment