Principle:Norrrrrrr lyn WAInjectBench Checkpoint Export
| Knowledge Sources | |
|---|---|
| Domains | Model_Management, Deep_Learning |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
A model persistence strategy that serializes the complete training state (model weights, optimizer state, training metadata) to a checkpoint file for resumable training and deployment.
Description
Checkpoint Export saves a comprehensive snapshot of the training state using torch.save. The saved dictionary includes:
- model_state: The full model state dict (including LoRA weights)
- optimizer_state: Optimizer momentum/variance buffers for training resumption
- epoch: The epoch number when this checkpoint was saved
- best_tpr: The validation TPR that triggered saving
- amp_enabled/amp_dtype: AMP configuration for reproducible inference
The checkpoint filename encodes the epoch and TPR for easy identification of the best model.
Usage
Use this when a new best TPR is achieved during validation. The saved checkpoint can be loaded for inference by the detector_image/llava.py module or for continued training.
Theoretical Basis
# Checkpoint save pattern
checkpoint = {
"epoch": current_epoch,
"model_state": model.state_dict(),
"optimizer_state": optimizer.state_dict(),
"best_metric": best_metric_value,
"training_config": config_dict,
}
torch.save(checkpoint, path)
# Load: checkpoint = torch.load(path); model.load_state_dict(checkpoint["model_state"])