Implementation:Norrrrrrr lyn WAInjectBench torch save Checkpoint
| Knowledge Sources | |
|---|---|
| Domains | Model_Management, Deep_Learning |
| Last Updated | 2026-02-14 16:00 GMT |
Overview
Concrete tool for saving LLaVA fine-tuning checkpoints with training metadata, provided by PyTorch's torch.save as used in the WAInjectBench train/llava-ft module.
Description
The checkpoint saving in train/llava-ft.py (L394-408) uses torch.save to write a dictionary containing the model state dict, optimizer state dict, epoch number, best TPR, and AMP configuration. The filename follows the pattern best_epoch{N}_tpr{X.XXXX}.pt under the --out_dir directory (default "runs/ft"). Checkpoints are only saved when the current epoch's TPR exceeds the previous best.
Usage
Called automatically within the validation loop when a new best TPR is achieved.
Code Reference
Source Location
- Repository: WAInjectBench
- File: train/llava-ft.py (L394-408)
Signature
if tpr > best_tpr:
best_tpr = tpr
best_path = os.path.join(args.out_dir, f"best_epoch{epoch}_tpr{tpr:.4f}.pt")
torch.save(
{
"epoch": epoch,
"model_state": model.state_dict(),
"optimizer_state": optim.state_dict(),
"best_tpr": best_tpr,
"amp_enabled": state.use_amp,
"amp_dtype": str(amp_dtype) if amp_dtype is not None else "fp32",
},
best_path
)
print(f"Saved: {best_path}")
Import
import torch
import os
I/O Contract
Inputs
| Name | Type | Required | Description |
|---|---|---|---|
| model | nn.Module | Yes | Trained model whose state_dict to save |
| optim | torch.optim.AdamW | Yes | Optimizer whose state_dict to save |
| epoch | int | Yes | Current epoch number |
| best_tpr | float | Yes | Best TPR achieved |
| state | TrainState | Yes | AMP configuration |
| out_dir | str | Yes | Output directory (default "runs/ft") |
Outputs
| Name | Type | Description |
|---|---|---|
| .pt file | File | Checkpoint at {out_dir}/best_epoch{N}_tpr{X.XXXX}.pt containing model_state, optimizer_state, epoch, best_tpr, amp_enabled, amp_dtype |
Usage Examples
Saving and Loading Checkpoints
import torch
# Save checkpoint
torch.save({
"epoch": 2,
"model_state": model.state_dict(),
"optimizer_state": optim.state_dict(),
"best_tpr": 0.9523,
"amp_enabled": True,
"amp_dtype": "torch.bfloat16",
}, "runs/ft/best_epoch2_tpr0.9523.pt")
# Load checkpoint for inference
ckpt = torch.load("runs/ft/best_epoch2_tpr0.9523.pt")
model.load_state_dict(ckpt["model_state"])
print(f"Loaded epoch {ckpt['epoch']} with TPR={ckpt['best_tpr']}")